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cal theory—Duflie (1992), Huang and Litzenberger (1988), and Ingersoll 
(1987), for example—but no equivalent textbook for empirical methods, 

During the same period, we participated in research conferences on 
Financial Markets and Monetary Economics, held under the auspices of the 
National Bureau of Economic Research in Cambridge, Massachusetts, Many 
of the papers that captured our attention at these mectings involved new 
econometrie methods or new empirical findings in financial economics. We 
felt that this was some of the most exciting research being done in finance, 
and that students should be exposed to this material at an carly stage, 

In 1989 we began to discuss the idea of writing a book that would cover 
econometric methods as applied to finance, along with some of the more 
Prominent empirical results in this arca. We began writing in earnest in 
1991, completing this arduous project five years and almost six hundred 
pages later. This book is considerably longer than we had originally planned, 
Dut we have finally overcome the temptation to include just one more new 
topic, and have put our pens to rest. Of course, the academic literature has 
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as this book goes to press. We have attempted to provide broad coverage, 
but even so, there are many subjects that we do not touch upon, and many 
others that we can only mention in passing. 
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sibly acknowledge. Throughout our professional carcers our colleagues and 
mentors have offered us advice, debate, Inspiration, and friendship; we wish 
lo thank in particular Andy Abel, Ben Bernanke, Steve Cecchetti, Jolin Cox, 
Angus Deaton, Gene Fama, Bruce Grundy, Jerry Hausman, Chi-fu Huang, 
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The Econometrics of Financial Markets 


Introduction 


FINANCIAL. ECONOMICS is a highly empirical discipline, perhaps the most 
empirical among the branches of economics and even among the social 
sciences in general, This should come as no surprise, for financial markets 
are not mere figments of theoretical abstraction; they thrive in practice 
and play a crucial role in the stability and growth of the global economy. 
"Therefore, although some aspects of the academic finance literature may 
seem abstract at first, there is a practical relevance demanded of financial 
models that is often waived for the models of other comparable disciplines, 

Despite the empirical nature of financial econornics, like the other so- 
cial sciences it is almost entirely nonexperimental. Therefore, the primary 
method of inference for the financial economist is model-based statistical 
inference—financial econometrics. While econometrics is also essential in 
other branches of economics, what distinguishes financial economics is the 
central role that uncertainty plays in both financial theory and its empirical 
implementation. The starting point for every financial model is the uncer- 
tainty facing investors, and the substance of every financial model involves 
the impact of uncertainty on the behavior of investors and, ultimately, on 
market prices. Indeed, in the absence of uncertainty, the problems of fi- 
nancial econamics reduce to exercises in basic microeconomics. The very 
existence of financial economics as a discipline is predicated on uncertainty. 

This has important consequences for financial econometrics, The ran- 
dom fluctuations that require the use of statistical theory to estimate and test 
financial models are intimately related to the uncertainty on which those 
models are based. For example, the martingale model for asset prices has 
very specific implications for the behavior of test statistics such as the au- 
tocorrefation coefficient of price increments (see Chapter 2). This close 
connection between theory and einpirical analysis is unparalleled in the 


Bernstein (1992) provides a highly readable account of the interplay between theory and 
practice in the development of modern financial economics. 
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social sciences, although it has been the hallmark of the natural sciences 
for quite some time, It is one of the most rewarding aspects of financial 
econometrics, so much so that we felt impelled to write this graduate-level 
textbook as a means of introducing others to this exciting field, 

Section 1.1 explains which topics we cover in this book, and how we have 
organized the material. We also suggest some ways in which the book might 
be used in a one-semester course on financial econometrics or empirical 
finance. 

In Section 1.2, we describe the kinds of background inaterial that are 
most useful for financial econometries and suggest references for those 
readers who wish to review or learn such material along the way. In our 
experience, students are often more highly motivated to pick up the nev- 
essary background afler they see how it is to be applied, so we encourage 
readers with a serious interest in financial econometrics but with somewhat 
less preparation to take a crack at this material anyway. 

In a book of this magnitude, notation becomes a nontrivial challenge 
of coordination; hence Section 1.3 describes what method there is in our 
notational madness. We urge readers to review this carefully to minimize 
the confusion that can arise when 6 is mistaken for f and X is incorrectly 
assumed to be the same as X. 

Section H. A extends our discussion of notation by presenting notational 
conventions for and definitions of some of the fundamental objects of our 
study: prices, returns, methods of compounding, and probability distribu- 
tions. Although much of this material is well-known to finance students and 
investment professionals, we think a brief review will help many readers. 

In Section 1.5, we turn our attention to quite a different subject the 
Efficient Markets Hypothesis. Because so much attention has been lavished 
on this hypothesis, often at the expense of other more substantive issues, 
we wish to dispense with this issue first. Much of the debate involves theo- 
logical tenets that are empirically undecidable and, therefore, beyond the 
purview of this text. But for completeness—no self-respecting finance text 
could omit market efficiency altogether—Section 1.5 briefly discusses the 
topic. 


1.1 Organization of the Book 


In organizing this book, we have followed two general principles. First, the 
early chapters concentrate exclusively ou stack markets. Although many of 
the methods discussed can be applied equally well to other asset markets, the 
empirical literature on stock markets is particularly large and by focusing on 
these markets we are able to keep the discussion concrete. In later chapters, 
we cover derivative securities (Chapters 9 and 12) and fixed-income securi- 
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ties (Chapters 10 and 11). The last chapter of the book presents nonlinear 
methods, with applications to both stocks and derivatives. | 

Second, we start by presenting statistical models of asset returns, and 
then discuss more highly structured economic models. In Chapter 2, for 
example, we discuss methods for predicting stock returns from their odn 
past history, without much attention to institutional detail; in Chapter 3 we 
show how the microstructure of stock markets affects the short-run behavior 
of returns, Similarly, in Chapter 4 we discuss simple statistical models of the 
cross-section of individual stock returns, and the application of these models 
to event studies; in Chapters 5 and 6 we show how the Capital Asset Pricing 
Model and multifactor models such as the Arbitrage Pricing Theory restrict 
the parameters of the statistical models. In Chapter 7 we discuss longer-run 
evidence on the predictability of stock rcturns from variables other than 
past stock returns; in Chapter 8 we explore dynamic equilibrium models 
which can generate persistent time-variation in expected returns. We use 
the same principle to divide a basic treatment of fixed-income securities 
in Chapter 10 from a discussion of equilibrium term-structure models in 
Chapter 11. 

We have tried to make each chapter as self-contained as possible. While 
some chapters naturally go together (e.g., Chapters 5 and 6, and Chapters 
10 and 11), there is certainly no need to read this book straight through 
from beginning to end. For classroom use, most teachers will find that there 
is too much material here to be covered in one semester. There are several 
ways to use the book in a one-semester course. For example one teacher 
might start by discussing short-run time-series behavior of stock prices using 
Chapters 2 and 3, then cover cross-sectional models in Chapters 4, 5, and 6, 
then discuss intertemporal equilibrium models using Chapter 8, and finally 
cover derivative securities and nonlinear methods as advanced topics using 
Chapters 9 and 12. Another teacher might first present the evidence on 
short- and long-run predictability of stock returns using Chapters 2 and 7, 
then discuss static and intertemporal equilibrium theory using Chapters 5, 
6, and 8, and finally cover fixed-income securities using Chapters 10 and 11. 

There are some important topics that we have not been able to includc 
in this text. Most obviously, our focus is almost exclusively on US domestic 
asset markets. We say very little about asset markets in other countries, and 
we do not try to cover international topics such as exchange-tate behav- 
ior or the home-bias puzzle (the tendency for each country's investors to 
hold a disproportionate share of their own country's assets in their portfo- 
lios). We also omit such important economeiric subjects as Bayesian analysis 
and frequency-domain methods of time-series analysis. In many cases our 
choice of topics has been influenced by the dual objectives of the book: 
to explain the methods of financial econometrics, and to review the em-! 
pirical literature in finance. We have tended to concentrate on topics that 
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involve econometric issues, sometimes at the expense of other equally inter- 
esting material—including much recent work in behavioral finance—that 
is econometrically more straightforward. 
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1.2 Useful Background 


The many rewards of financial econometrics come at a price. A solid back- 
ground in mathematics, probability and statistics, and finance theory is nec- 
essary for the practicing financial econometrician, for precisely the reasons 
that make financial econometrics such an engaging endeavor. To assist 
readers in obtaining this background (since only the most focused and di- 
rected of students will have it already), we outline in this section the topics 
in mathematics, probability, statistics, and finance theory that have become 
indispensable to financial econometrics. We hope that this outline can serve 
as a self-study guide for the more enterprising readers and that it will be a 
partial substitute for including background material in this book. 


1 2.1 Mathematics Bachground 


The mathematics background most useful for financial econometrics is not 
unlike the background necessary for econometrics in general: multivariate 
calculus, linear algebra, and matrix analysis. References for each of these 


topics are Lang (1973), Strang (1976), and Magnus and Neudecker (1988), 
respectively. Key concepts include 


e multiple integration 

e multivariate constrained optimization 
e matrix algebra 

e basic rules of matrix differentiation. 


In addition, option- and other derivative-pricing models, and continuous- 
lime asset pricing models, require some passing familiarity with the Hû or 
stochastic calculus. A lucid and thorough treatment is provided by Merton 
(1990), who pioneered the application of stochastic calculus to financial 
economics. More mathematically inclined readers may also wish to consult 


Chuny and Williams (1990). 


Ц 
| 1.2.2 Probability aud Statistics Background 


, 
Basic probability theory is a prerequisite for any discipline in which uncer- 
tainty is involved. Although probability theory has varying degrees of mathe- 
matical sophistication, from coin-flipping calculations to measure-theoretic 
foundations, perhaps the most useful approach is one that emphasizes the 
i 
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intuition and subtleties of elementary probabilistic reasoning. An amaz- 
ingly durable classic that takes just this approach is Feller (1968). Brieinan 
(1992) provides similar intuition but at a measure-theorctic level. Key con- 
cepts include 


e delinition of a random variable 

e independence 

e distribution and density functions 
e conditional probability 

e modes of convergence 

e laws of large numbers 

e central limit theorems. 


Statistics is, of course, the primary engine which drives the inferences 
that financial econometricians draw from the data. As with probability the- 
ory, statistics can be taught al various levels of mathematical sophistication. 
Moreover, unlike the narrower (and some would say “purer”) focus of proba- 
bitity theory, statistics has increased its breadth as it has matured, giving birth 
to many well-defined subdisciplines such as multivariate analysis, nonpara- 
metrics, timesseries analysis, order statistics, analysis of variance, decision 
theory, Bayesian statistics, etc. Each of these subdisciplines has been drawn 
upon by financial econometricians at one time or another, making it rather 
difficult to provide a single reference for all of these topics. Amazingly, 
such a reference does exist: Stuart and Ord's (1987) three-volume tour de 
Јосе. A more compact reference that contains most of the relevant material 
for our purposes is the elegant monograph by Silvey (1975). For topics in 
time-series analysis, Hamilton (1994) is an excellent comprehensive text. 
Key concepts include 


e Neyman-Pcarson hypothesis testing 

e linear regression 

e maximum likelihood 

e basic time-series analysis (stationarity, autoregressive and ARMA pro- 
cesses, vector autoregressions, unit roots, etc.) 

e clementary Bayesian inference. 


For continuous-time financial models, an additional dose of stochastic pro- 
cesses is a must, at least at the level of Cox and Miller (1965) and Hoel, Port, 
and Stone (1972). 


1.2.3 Finance Theory Background 


Since the raison d'être of financial econometrics is the empirical implemen- 
tation and evaluation of financial models, a solid background in finance 
theory is the most important of all. Several texts provide excellent coverage 


of this material: Duffie (1992), Huang and Litzenberger (1988), Ingersoll 
197 and Menos taae Це scent include 


“каутп aO es per lOc ttt n 

e static mean-variance portfolio theory 

e the Capital Asset Pricing Model (CAPM) and the Arbitrage Pricing The 
ory (APT) 

e dynamic asset pricing models 

e option pricing theory. 


1.3 Notation 


We have found that it is far from simple to devise a consistent notational 
scheme for a book of this scope. The difficulty comes from the fact that 
financial econometrics spans several very different strands of the finance 
literature, each. replete with its own firmly established set of notational 
conventions. But the conventions in one literature often conflict with the 
conventions in another, Unavoidably, then, we must sacrifice either inter- 
nal notational consistency across different chapters of this text or external 
consistency with the notation used in the professional literature. We have 
chosen the former as the lesser evil, but we do maintain the following con- 
ventions throughout the book: 


e We use boldface for vectors and matrices, and regular face for scalars. 
Where possible, we use bold uppercase for matrices and bold lowercase 
for vectors, Thus x is a vector while X is a matrix. 

e Where possible, we use uppercase letters for the levels of variables and 
lowercase letters for the natural logarithms (logs) of the same variables. 
Thus if P is an asset price, pis the log asset pricc. 

e Our standard notation for an innovation is the Greek letter є. Where 
we need to define several different innovations, we use the alternative 
Greek letters n, &, and б. 

e Where possible, we use Greek letters to denote parameters or parameter 
vectors, 

e We use the Greek letter e to denote a vector of ones. 

e We ase hats to denote sample estimates, so if B is à parameter, Ё isan 
estimate of 8. 

e When we use subscripts, we always use uppercase letters for the upper 
limits of the subscripts. Where possible, we use the same letters for 
upper limits as for the subscripts themselves. Thus subscript f runs 
from I to T, subscript k runs from I to K, and so on. An exception is 
that we will fet subscript 7 (usually denoting an asset) run from ! to N 
because this notation is so common. We use Cand т for time subscripts: 


i for asset subscripts; k, m, and n for lead and lag subscripts; and j asa 
generic subscript 

e № иа ль mue GONE др: ayrandar-: s ABa LET SST 
the end of period t. Thus R, denotes a return on an asset held from the 
end of period = to the end of period t. 

e In writing variance-covariance matrices, we use $ for the variance- 
covariance matrix of asset returns, È for the variance-covariance matrix 
of residuals from a time-series or cross-sectional model, and V for the 
variance-covariance matrix of parameter estimators. 

e We use script letters sparingly. M denotes the normal distribution, and 
С denotes a log likelihood function. : 

e We use Pr(-) to denote the probability of an event. 


The professional literature uses тапу specialized terms. Inevitably we 


also use these frequently, and we italicize them when they first appear in the 
book. 


Virtually every aspect of financial economics involves returns, and there are at 
least two reasons for focusing our attention on returns rather than on prices. 
First, for the average investor, financial markets may be considered close to 
perfectly competitive, so that the size of the investment does not affect price 
changes. Therefore, since the investment “technology” is constant-returns- 
to-scale, the return is a complete and scale-free summary of the investment 
opportunity. | 
Second, for theoretical and empirical reasons that will become apparent 
below, returns have more attractive statistical properties than prices, such 
as stationarity and ergodicity. In particular, dynamic general-equilibrium 
models often yield nonstationary prices, but stationary returns (sce, for 
example, Chapter 8 and Lucas [1978]). ^ 


1.4 Prices, Returns, and Compounding 


1.4.1 Definitions and Conventions 


Denote by P; the price of an asset at date t and assume for now that this asset 
pays no dividends. The simple net return, R,, on the asset between dates t — 1 
and t is defined as 
Р, 
В, = = – 1. (1.4.1) 
Pia 


The simple gross return on the asset is just onc plus the net return, 1 + N. 
From this definition it is apparent that the asset's gross return over the 
most recent k periods from date ( k to date t, written 1 + R,(k), is simply 
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equal to the product of the k single-period returns from = k + 1 to J. ie., 


BR) = GER) +R- Д) 
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and its net return over the most recent k periods, written 1%, is simply 
equal to its A period gross return minus опе, These multiperiod returns arc 
called compound returns. 
Althongh returns are scale-free, it should be emphasized that they are 
not unitless, but are always defined with respect to some tme interval, c. g., 
one “period.” In fact, R, is more properly called a rale of return, which is 
morcxumbersome terminology but more accurate in referring to J asa rate 
ог, ini economic jargon, a flow variable. Therefore, a retum of 20% is not 
a complete description of the investment opportunity without specification 
of the return horizon. In the academic literature, the return horizon is 
generally given explicitly, often as part of the data description, e.g., "The 
` CRSP monthly returns file was used." 
However, among practitioners and in the financial press, a return- 
‘horizon of one year is usually assumed implicitly; hence, unless stated oth- 
- erwise, a return of 20% is generally taken to mean an annual return of 2076. 
AS ае multiyear returns are often annualized to make investments with 
‘se different horizons comparable, thus: 


| 

2. | k-l us 
| Annualized(R(A)] = | [ [0 + 8-0| -1 (1.4.3) 
| 


n 


Since РЕ returns are generally small іп magnitude, the follow- 

ing approximation based on a first-order Taylor expansion is often used to 
D) 

annualize multiyear returus: 


|! 
Annualized[A4(k)]. x n ў: Ry). (1.4.4) 


y= 


Whether such an approximation is adequate depends on the particular 
application at hand; it may suffice for a quick and coarse comparison of 
investment performance across many assets, but for finer calculations in 
which the volatility of returns plays an important role, i.e., when the higher- 
order terms in the Taylor expansion are not negligible, the approximation 
(1.4.4) may break down. The only advantage of such an approximation is 
convenience—it is easier to calculate an arithmetic rather than a geomet- 
ric average—however, this advantage has diminished considerably with the 
advent of cheap and convenient computing power. 
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Continuous Compounding 
"1e difficulty of manipulating geometric averages such as (1.4.3) motivates 
is et approach to compound returns, one which is not approximate and 
also has important implicatious for modeling asset veturus; this is the notion 
of continuous compounding. The continuously compounded return or log return 
r, of an asset is defined to be the natural logarithm of its gross return (Ii: 
Р, 
n = log(l- R) = log Aa = fy fna. (1.4.5) 


where p = log Pj. When we wish to emphasize the distinction between В, 
and л, we shall refer to Id as a simple return. Our notation here deviates 
slightly from our convention that lowercase letters denote the logs of up- 
percase letters, since here we have n = log(1 + R,) rather than log(tj); we 
do this to maintain consistency with standard conventions. 


The advantages of continuously compounded returns become clear 
when we consider multiperiod returns, since 


Ш 


n, (t) log(1 + (А) = log(1 + R) -O + Rea) + Rog) 


log(1 + Ry) + log(l + Rui) +۰۰۰ + logQ + Riri) 


n + n-i Heee + riekt (1.4.6) 


and hence the continuously compounded multiperiod return is simply the 
sum of continuously compounded single-period returns. Compounding, 
a multiplicative operation, is converted to an additive operation by taking 
logarithms. However, the simplification is not merely in reducing multi- 
plication to addition (since we argued above that with modern calculators 
and computers, this is trivial), but more in the modeling of the statistical 
behavior of asset returns over time—it is far easier to derive the time-series 
properties of additive processes than of multiplicative processes, as we shall 
sce in Chapter 2. 

Continuously compounded returns do have one disadvantage. The sim- 
ple return on a portfolio of assets is a weighted average of the simple returns 
on the assets themselves, where the weight ou each asset is the share of the 
portfolio's value invested in that asset. If portfolio p places weight wy in as- 
set & then the return on the portfolio at time 4, Rpr, is related to the returns 
on individual assets, Ry, i = 1... N, by Ru = Y wy Ry. Unfortunately 
continuously compounded returns do not share this convenient property. 
Since the log of a sum is not the saine as the sum of logs, pt docs not equal 
уз, И 

In empirical applications this problem is usually minor. When returns 
are measured over short intervals of time, and are therefore close to zero, 
the continuously compounded return on a portfolio is close to the weighted 
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Figure I. IJ. Dividend Payment Timing Convention 


average of the continuously compounded returns on the individual assets: 
Ip T D2 артиг We use this approximation in Chapter 3. Nonetheless 
it is common to use simple returns when a cross-section of assets is being 
studied, as in Chapters 4-6, and continuously compounded returns when 
the temporal behavior of returns is the focus of interest, as in Chapters 2 


and 7. 


Dividend Payments 

For assets which make periodic dividend payments, we must modify our 
definitions of returns and compounding. Denote by D, the asset's dividend 
payment at date гапа assume, purely as a matter of convention, that this 
dividend is paid just before the date- price P; is recorded; hence P is taken 
to be the ex-dividend price at date f, Alternatively, one might describe Pj as 
an end-of-period asset price, as shown in Figure 1.1. Then the net simple 
return at date / may be defined as 


H 
R = Ша — 1. (1.4.7) 
1-1 

Multiperiod and continuously compounded returns may be obtained 
in the same way as in the no«lividends case. Note that the continuously 
compounded return on à dividend-paying asset, n = logi, +D) -log(iPri). 
is a nonlinear function of log prices and log dividends. When the ratio 
of prices to dividends is not too variable, however, this function can be 
approximated by a linear function of log prices and dividends, as discussed 

in detail in Chapter 7. 


Excess Returns 

It is often convenient to work with an asset's excess return, defined as the 
difference between the assets return aud the return on some reference 
asset, The reference asset is often assumed to be riskless and in practice is 
usually а short-term Treasury bill return. Working with simple returns, the 


"In the limit where tinte ts continuos, пох Lemans, discussed in Section t L2 of Chapter ©, 
can be used to relate simple and continioush compounded renu ns. 
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simple excess return on asset i is 


Zu = Hu ш Ror. | (1.4.8) 


where Ry is the reference return. Alternatively onc can define a log excess 
return as 


Zu = 7 — Hy. (1.4.9) 


The excess return can also be thought of as the payoff on an arbitrage 
portfolio that goes long in asset ? and short in the reference asset, with no 
net investment at the initial date. Since the initial net investment is zero, 
the return on the arbitrage portfolio is undefined but its dollar payoff is 
proportional to the excess return as defined above. 


1.4.2 The Marginal, Conditional, and Joint Distribution of Returns 


Having defined asset returns carefully, we can now begin to study their 
behavior across assets and over time. Perhaps the most important charac- 
teristic of asset returns is their randomness. The return of IBM stock over 
the next month is unknown today, and it is largely the explicit modeling 
of the sources and nature of this uncertainty that distinguishes financial 
economics from other social sciences. Although other branches of eco- 
nomics and sociology do have models of stochastic phenomena, in none 
of them does uncertainty play so central a role as in the pricing of finan- 
cial assets— without uncertainty, much of the financial economics literature, 
both theoretical and empirical, would be superfluous. Therefore, we must 


articulate at the very start the types of uncertainty that asset returns might 
exhibit. | 


The Joint Distribution 
Consider a collection of N assets at date t, each with return Ry at ion t 


where { = 1,..., T. Perhaps the most general model of the collection of 
returns {Ry} is its joint distribution function: | 
| 


G(Ry Ruts Niz. . ., Ко Рато... Nx: х | 0). (1.4.10) 


where x is a vector of state variables, variables that summarize the economic 
environment in which asset returns are determined, and Ó is a vector of 
fixed parameters that uniquely determines C. For notational convenience, 
we shall suppress the dependence of G on the parameters @ unless it is 
needed. ^ 
The probability law G governs the stochastic behavior of asset returns 
and x, and represents the sum total of all knowable information about them. 
We may then view financial econometrics as the statistical inference of 6, 
given G and realizations of {Ru}. Of course, (1.4.10) is far too general to 
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be of any use for statistical inference, and we shall have to place further 
restrictions on G in the coming sections and chapters. However, (1.4.10) 
does serve as a convenient way to organize the many models of asset re- 
turns to be developed here and in later chapters. For example, Chapters 2 
tough 6 deal exclusively with the joint distribution of (Ry), leaving addi- 
падпа! state variables x to be considered in Chapters 7 aud 8. We write this 
joint distribution as Gp. 

1 Many asset pricing models, such as the Capital Asset Pricing Model 
(CAPM) of Sharpe (1964), Lintner (1965a, b), and Mossin (1906) consid- 
eréd i in Chapter 5, describe the joint distribution of the cross section of re- 
turns (Ry... na} at a single date t. To reduce (1.4.10) to this essentially 
stalic structure, we shall have to assert that returns are statistically indepen- 
dent through time and that the joint distribution of the cross-section of 
rethirns is identical across time. Although such assumptions seem extreme, 
they yield a rich set of implications for pricing financial assets. The CAPM, 
forjexample, delivers an explicit formula for the trade-off between risk and 
expected return, the celebrated security market line. 


The: Conditional Distribution 

In Chapter 2, we place another set of restrictions on Gr which will allow us 
to focus on the dynamics of individual asset returns while abstracting from 
cross-sectional relations between the assets. In particular, cousider the joint 
distribution F of H.. . . . Rir} for a given asset i, aud observe that we may 
always rewrite F as the following product: 


F(ta,..., Rir) = HN) Flo | Ка) FOU | Re Ra) 
Вт | Ryan... Ra). (1.4.11) 


From (1.4.11), the temporal depeudencies implicit in {Re} are apparent. 
Issues of predictability i in asset returns involve aspects of their conditional 
distributions and, in particular, how the conditional distributions evolve 
through time. 

By placing further restrictions on the conditional distributions F(C), we 
shall be able to estimate the parameters 0 implicit in (1.4.11) and exam- 
ine the predictability of asset returns explicitly. For example, one version 
of the random-walk hypothesis is obtained by the restriction that the con- 
ditional distribution of return Ry is equal to its marginal distribution, i.e., 
0 |) = Fal Ra). M this is the case, then returns are temporally indepen- 
dent and therefare unpredictable using past returns. Weaker versions of the 
random walk ace obtained by imposing weaker restrictions on F(R, | °) 


The Unconditional Distribution 
In cases where an asset returu's conditional distribution differs [rom its 
marginal or unconditional distribution, it is clearly the conditional distribu- 
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tion that is relevant for issues involving predictability. However, the proper- 
ties of the unconditional distribution of returns may still be of some interest, 
especially in cases where we expect predictability to be minimal. 

One of the most common models for asset returns is the temporally 
independently and identically distributed (HD) normal model, in which 
returns are assumed to be independent over time (although perhaps cross- 
secuonally correlated), identically distributed over time, and normally dis- 
wibuted. The original formulation of the CAPM employed this assumption 
of normality, although returns were only implicit assumed to be tempo- 
rally HD (since it was a static “two-period” model). More recently, models 
of asymmetric information such as Grossman (1989) and Grossman and 
Stiglitz (1980) also use normality. 

While the temporally ID normal model may be tractable, it suffers from 
at least two important drawbacks. First, most financial assets exhibit limited 
liability, so that the largest loss an investor can realize is his total investment 
and no more. This implies that the smallest net return achievable is —1 
ог —100%. But since the normal distribution's support is the entire real 
line, this lower bound of —1 is clearly violated by normality. Of course, it 
may be argued that by choosing the mean and variance appropriately, the 
probability of realizations below —1 can be made arbitrarily small; however 
it will never be zero, as limited liability requires. 

Second, if single-period returns are assumed to be normal, then multi- 
period returns cannot also be normal since they are the products of the single- 
period returns, Now the sums of normal single-period returns are indeed 
normal, but the sum of single-period simple returns does not have any eco- 
nomically meaningful interpretation. However, as we saw in Section 1.4.1, 
the sum of single-period continuously compounded returns does have a 


meaningful interpretation as a multiperiod continuously compounded re- 
turn. 


The Lognormal Distribution 


A sensible alternative is to assume that conunuously compounded single- 
period returns 7, are HD normal, which implies that single-period 
gross simple returns are distributed as HD lognormal variates, since rj, = 
log(14- Ru). We may express the lognormal model then as 

8 u 
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Under the lognormal model, if the mean and variance of ry are ру and оў, 
respectively, then the mean and variance of simple returns are given by 


EL Ric] 
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Alternatively, if we assume thai the mean and variance of simple returns Ry 


2 А 
are m, and 57, respectively, then under the lognormal model the mean and 
variance of n, are given by 


Eb] = oB (1.4.13) 


S5 P $ 
Vafo, = 108 | | mm А INI 
wl og (=) ( 5 


The lognormal model has the added advantage of not violating limited 
liability, since limited liability vields à lower bound of zero on (1 А), 
which is satisfied by (1+ % = e when n, is assumed to be normal. 

The lognormal model has a long and illustrious history, beginning with 
the dissertation of the French mathematician Louis Bachelier (1900), which 
contained the mathematics of Brownian motion and heat conduction, five 
years prior to Eiustein's (1905) famous paper. For other reasons that will be- 
come apparent in later chapters (see, especially, Chapter 9), the lognormal 
model has become the workhorse of the financial asset pricing literature, 

Butas attractive as the lognormal model is, it is not consistent with all the 
properties of historical stock returns, At short horizons, historical returns 
show weak evidence of skewness and strong evidence of excess kurtosis. The 


skewness, or normalized third moment, of a random variable e with mean и 
D КАЧ . 
and variance a^ is defined by 


ny 
Sle] = к]. (14.17) 
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The kurtosis, or normalized fourth moment, of e is defined by 
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(1.4.18) 
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The normal disiibution has skewness equal to zero, as do all other sym- 
metric distributions, The normal distribution has kurtosis equal to 3, but 


fattailed distributions with exta probability mass in the taifareas have higher 


or even infinite kurtosis. 


Skewness and kurtosis can be estimated in a sample of data by construct- 
ing the obvious sample averages: the sample mean 
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the sample variance i 


Р : 
ó? = Ne an. (14.20) 
the sample skewness f | 
Ss سب‎ Ser f. (1431) 
=| 
and the sample kurtosis 
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In large samples of normally distributed data, the estimators $ and K are 
normally distributed with means 0 and 3 and variances 6/ T and 24/ T; 
respectively (see Stuart and Ord [1987, Vol. 1]). Since 3 is the kurtosis of thd 
normal distribution, sample excess kurtosis is defined to be sample kurtosis 
less 3. Sample estimates of skewness for daily US stock returns tend to be 
negative for stock indexes but close to zero or positive for individual stocks. 
Sample estimates of excess kurtosis for daily US stock returns are large and 
positive for both indexes and individual stocks, indicating that returns have 
more mass in the tail areas than would be predicted by a normal distribution. 


Stable Distributions . 

Early studies of stock market returns attempted to capture this excess kur- 
tosis by modeling the distribution of continuously compounded returns as 
a member of the stable class (also called the stable Pareto-Lévy or stable Pare- 
tian), of which the normal is a special case? The stable distributions are a 
natural generalization ofthe normal in that, as their name suggests, they are 
stable under addition, i.c., a sum of stable random variables is also a stable 
random variable. However, nonnormal stable distributions have more prob- 
ability mass in the tail areas than the normal. In fact, the nonnormal stable 
distributions are so fat-tailed thai their variance and all higher moments are 
infinite, Sample estimates of variance or kurtosis for random variables with 


“The French probabilist Paul Lévy (1924) was perhaps the first to initiate a general investi- 
gation of stable distributions and provided a complete characterization of them through their 
log-characteristic functions (see below). Lévy (1925) also showed that the tail probabilities 
of stable distributions approximate those of the Pareto distribution, hence the term "stable 
Pareto Ску? or "stable Paretian” distribution. For applications to financial asset returns, see 
Bhitberg and Gonedes (1974); Fama (1965); Fama and Roll (1971); Fielitz (1976); Fielitz and 
Rozell (1983); Granger and Morgenstern (1970); Hagerman (1978); Hsu, Miller, and Wichern 
(1974), Mandelbrot (1963); Mandelbrot and Taylor (1967); Officer (1972); Samuelson (1967, 
1976); Simkowitz and Beedles (1980); and Tucker (1992). 
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Figure 1.2. Comparison of Stable and Normal Density Functions 


these distributions will not converge as the sample size increases, but will 
tend to increase indefinitely. 

Closed-form expressions for the density functions of stable random vari- 
ables are available for only three special cases: the normal, the Cauchy, and 


the Bernoulli cases.“ Figure 1.2 illustrates the Cauchy distribution, with 
density function А ° 


1 y 
= ,لد‎ 1.4.23 
IONS Ec c (1.4.23) 


In Figure 1.2, (1.4.23) is graphed with parameters 6 = O and y = I, and it 
is apparent from the comparison with the normal density function (dashed 
lines) that the Cauchy has fatter tails than the normal. 

Although stable distributions were popular in the 1960's and early 1970's, 
they are less commonly used today. They have fallen out of favor partly be- 
cause they make theoretical modelling so difficult; standard finance theory 


‘However, Lévy (1925) derived the following explicit expression for the logarithm of the 
characteristic function p(t) of any stable random variable X: log e(t) = log Ele] = iôt — 
УН“ th ~ iBsgnt)tan(oz/2)), where (a, P, 6, y) are the four parameters that characterize 
each stable distribution. 6 € (g. 00) is said to be the location parameter, f € (-O. оо) is the 
skewness index, y € (0, оо) is the scale parameter, and & (0,2] is the exponent, When d = 2, 
the stable distribution reduces to a normal, As о decreases from 2 to 0, the tail areas of the 
stable distribution become increasingly “fatter” than the normal. When æ € (1, 2), the stable 
distribution has a finite mean given by 6, but when æ € (0, 1}, even the mean is infinite. The 
parameter B measures the symmetry of the stable distribution; when f = 0 the distribution is 
symmetric, and when g > 0 (or P < 0) the distribution is skewed to the right (or left), When 
B-0 apa a = 1 ме have the Cauchy distribution, and when а = 1/2, f = 1,6 = 0, andy = 1 
we hav ihe Bernoulli distribution. 
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almost always requires finite second moments of returns, and often finite 
higher moments as well. Stable distributions also have some counterfac- 
tual implications, First, they imply that sample estimates of the variance 
and higher moments of returns will tend to increase as the sample size in- 
creases, whereas in practice these estimates seem to converge. Second, they 
imply that long-horizon returns will be just as non-normal as short-horizon 
returns (since long-horizon returns are sums of shorchorizon returns, and 
these distributions are stable under addition). In practice the evidence 
for non-normality is much weaker for long-horizon returns than for short- 
horizon returns. 

Recent research tends instead to model returns as drawn from a fat- 
tailed distribution with finite higher moments, such as the ( distribution, 
or as drawn from a mixture of distributions. For example the return might 
be conditionally normal, conditional on a variance parameter which is itself 
random; then the unconditional distribution of returns is a mixture of nor- 
mal distributions, some with small conditional variances that concentrate 
mass around the mean and others with large conditional variances that put 
mass in the tails of the distribution. The result is a fat-tailed unconditional 
distribution with a finite variance and finite higher moments. Since all 
moments are finite, the Central Limit Theorem applies and long-horizon 
returns will tend to be closer to the normal distribution than short-horizon 
returns. It is natural to model the conditional variance as a time-series 
process, and we discuss this in detail in Chapter 12. 


An Empirical Illustration 
Table 1.1 contains some sample statistics for individual and aggregate stock 
returns from the Center for Research in Securities Prices (CRSP) for 1962 
to 1994 which illustrate some of the issues discussed in the previous sec- 
tions, Sample moments, calculated in the straightforward way described 
in (1.4.19)-(1.4.22), are reported for value- and equal-weighted indexes 
of stocks listed on the New York Stock Exchange (NYSE) and American 
Stock Exchange (AMEX), and for ten individual stocks. The individual 
stocks were selected from market-capitalization deciles using 1979 end-of- 
year market capitalizations for all stocks in the CRSP NYSE/AMEX universe, 
where International Business Machines is the largest decile's representative 
and Continental Materials Corp. is the smallest decile's representative. 
Panel A reports statistics for daily returns. The daily index returns have 
extremely high sample excess kurtosis, 34.9 and 26.0 respectively, a clear 
sign of fat tails. Although the excess kurtosis estimates for daily individual 
stock returns are generally less than those for the indexes, they are still large, 
ranging from 3.35 to 59.4. Since there are 8179 observations, the standard 
error for the kurtosis estimate under the null hypothesis of normality is 
v 24/8179 = 0.054, so these estimates of excess kurtosis are overwhelmingly 
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statistically significant. The skewness estimates are negative for the daily 
index returns, —1.33 and —0.93 respectively, but generally positive for the 
individual stock returns, ranging from —0.18 to 2.25. Many of the skewness 
estimates are also statistically significant as the standard error under the null 
hypothesis of normality is /6/8179 = 0.027. 

Panel B reports sample statistics for monthly returns. These are con- 
siderably fess leptokurtic than daily returns—the valne- and equal-weighted 
CRSP monthly index returns have excess kurtosis of only 2.42 and 4.14, re- 
spectively, an order of magnitude smaller than the excess kurtosis of daily 
returns. As there are only 390 observations the standard error for the kurto- 
sis estimate is also much larger, 0.248. This is one piece of evidence that has 
led researchers to use fat-tailed distributions with finite higher maments, for 


which the Central Limit Theorem applies and drives longer-horizon returns 
towards normality. 


1.5 Market Efficiency 


The origins of the Efficient Markets Hypothesis (EMH) can be traced hack 
at least as far as the pioneering theoretical contribution of Bachelier (1900) 
and the empirical research of Cowles (1933). The modern literature in eco- 
nomics begins with Samuclson (1965), whose contribution is neatly sum- 
marized by the tide of his article: “Proof that Properly Anticipated Prices 
Fluctuate Randomly". In an informationally efficient market—not to be 
confused with an allocationally or Pareto-efficient market—price changes 
must be unforecastable if they are properly anticipated, ie., if they fully 
incorporate the expectations and information of all market participants. 
Fama (1970) summarizes this idea in his classic survey by writing: “A 
market in which prices always ‘fully reflect! available information is called 
'efficient." Fama's use of quotation marks around the words "fully reflect” 
indicates that these words are a form of shorthand and need to be explained 


more fully. More recently, Malkiel (1992) has offered the following more 
explicit definition: 


A capital market is said to be efficient if it fully and correctly reflects 
all relevant information in determining security prices. Formally, the 
marke tis said to be efficient with respect to some information set... if 
security prices would be unaffected by revealing that information to all 
participants. Morcover, efficiency with respect to an information set 


Bernstein (09023 discusses the contributions af Bachelier, Cowles, Samuelson, and nuw 


other early authors. Che atte les reprinted in Lo GAG) include some of the most importat 
papers in this literae. 
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Table J. J. Stock market returns, 1962 to 1994, 


Standard Excess 
Security Mean Deviation Skewness Kurtosis Minimum Maximum 
Panel A: Daily Returns 

Value-Weighted Index 0.044 0.82 -133 34.92 —18.10 8.87 
Equal-Weighted Index 0.073 0.76 —0.93 26.03 — —14.19 9.83 
International Business 

Machines 0.039 1.42 —0.18 12.48 22.96 11.72 
General Signal Corp. 0.054 1.66 0.01 3.35 —13.46 9.43 
Wrigley Co. 0.072 1.45 —0.00 11.03 18.67 11.89 | 
Interlake Corp. 0.043 2.16 0.72 12.35 —17.24 23.08 ; 
Raytech Corp. 0.050 3.39 2.25 59.40 —57.90 75.00 
Ampco-Pitsburgh Corp. 0.053 2.41 0.66 5.02 —19.05 19.18 
Energen Corp. 0.054 1.41 0.27 5.91 ~12.82 11.11 
General Host Corp. 0.070 2.79 0.74 6.18 —23.53 22.92 
Caran Inc, 0.079 2.35 0.72 7.13 16.67 19.07 
Continental Materials Corp. 0.143 5.24 0.93 6.49  —26.92 50.00 | 

Panel B: Monthly Returns 

Value-Weighted Index 0.96 4.33 —0.29 242 —21.81 16.51 
Equal-Weighted Index 1.25 5.77 0.07 4.14 — 26.80 33.17 i 
International Business р 

Machines 0.81 6.18 —0.14 0.83 —26.19 18.95 
General Signal Corp. 1.17 8.19 —0.02 187 —3677 29.73 | 
Wrigley Co. 1.51 6.68 0.30 131  -20326 29.72 $ 
Interlake Corp. 0.86 9.38 0.67 4.09 —30.28 54.84 
Raytech Corp. 0.83 14.88 2.73 22.70 —45.65 142.11 
Ampco- Pittsburgh Corp. 1.06 10.64 0.77 2.04 —36.08 46.94 
Energen Corp. 1.10 5.75 1.47 12.47 ~24.61 48.36 
General Host Corp. 1.33 11.67 0.35 1.11 — 58.05 42.86 
Garan Inc. 1.64 11.30 0.76 2.30 —35.48 51.60 
Continental Materials Corp. 1.64 17.76 1.13 3.33 —58.09 84.78 


Summary statistics for daily and monthly returns (in percent) of CRSP equal- and value- 
weighted stock indexes and ten individual securities continuously listed over the entire sample 
period from July 3, 1962 to December 30, 1994. Individual securities are selected to represent 
stocks in each size decile. Statistics are defined in (1.4.19)- (1.4.22). 


. implies that it is impossible to make economic profits by trading on 
the basis of (that information set]. 


Malkiel's first sentence repeats Fama’s definition. His second and third sen- 
tences expand the definition in two alternative ways. The second sentence 
suggests that market efficiency can be tested by revealing information to 
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market participants and measuring the reaction of security prices. I prices 
do hot move when information is revealed, then the market is efficient with 
respect to that information. Although this is clear conceptually, it is hard to 
b out such a test in practice (except perhaps in a laboratory). 
| Malkiel's third sentence suggests an alternative way to judge the efti- 
ciency of a market, by measuring the profits that can be made by trading on 
information. This idea is the foundation of almost all the empirical work 
on market efficiency. It has been used in two main ways. First, many re- 
searchers have tried to measure the profits earned by market professionals 
such as mutual fund managers. If these managers achieve superior returus 
(after adjustment for risk) then the market is not efficient with respect to tlie 
information possessed by the managers. This approach has the advantage 
that ít concentrates on real wading by real market participants, but it has the 
disadvantage that one cannot directly observe the information used by the 
managers in their trading strategies (see Fama (1970, 1991] for a thorough 
review of this literature). 
As an alternative, one can ask whether hypothetical trading based оп 
an explicitly specified information set would earn superior returns. ‘To 
implement this approach, one must first choose an information set, The 


classic taxonomy of information sets, due to Roberts (1967), distinguishes 
among 


Weak-form Efficiency: The information set includes ouly the history of 
prices or returns themselves. 

Semistrong-Form Efficiency: The information set includes all information 
known to all market participants (publicly available information). 

Strong-Form Efficiency: The information set includes all information 
known to any market participant (private information). 


The next step is to specify a model of “normal” returns. Herc the classic 
assumption is that the normal returns on a sccurity are constant over time, 
but in recent years there has been increased interest in equilibrium inodels 
with time-varying normal security returns. 

Finally, abnormal security returns are computed as the difference be- 
месне return on a security and its normal return, and forecasts of the 
abnormal returns are constructed using the chosen information set. If the 
abnormal security return is unforecastable, and in this sense “random,” then 
the hypothesis of market efficiency is not rejected, 


i > . f ` а 

| 1.5.1 Efficient Markets and the Law of Heated Expectations 

The idea that efficient security returns should be random has often caused 

confusion. Many people seem to think that an efficient security price should 
| 
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be smooth rather than random. Black (1971) has attacked this idea rather 
effecuvely: 


A perfect market for a stock is one in which there are no profits to 
be made by people who have no special information about the com- 
pany, and in which it is difficult even for people who do have special 
information to make profits, because the price adjusts so rapidly as the 
information becomes available.... Thus we would like to see randomness 
in the prices of successive transactions, rather than great continuity. . . . 
Randomness means that a series of small upward movements (or small 
downward movements) is very unlikely. If the price is going to move up, 
it should move up all at once, rather than in a series of small steps. 
Large price movements are desirable, so long as they are not consistently 
followed by price movements in the opposite direction. 


Underlying this confusion may be a belief that returns cannot be random 
if security prices are determined by discounting future cash flows. Smith 
11968), for example, writes: “I suspect that even if the random walkers an- 
nounced a perfect mathematic proof of randomness, I would go on believing 
thacin the long run future earnings influence present value.“ 

In fact, the discounted present value model of a security price is entirely 
consistent with randomness in security returns, The key to understanding 
this is the so-called Law of Herated Expectations. To state this result we define 
information sets J and , where J. C J, so all the information in J, is also in 
jı but Jı is superior because it contains some extra information. We consider 
expectations of a random variable X conditional on these information sets, 
writen ЕХ | 4] or E[X | f]. The Law of Iterated Expectations says that 
E(X | 4] = ЕХ | J] | 1]. In words, if one has limited information 
h, the best forecast one can make of a random variable X is the forecast 
of the forecast one would make of X if one had superior information J. 
This can be rewritten as E[X — EIX | fj] | = 0, which has an intuitive 
interpretation: One cannot use limited information /; to predict the forecast 
error one would make if one had superior information fe 

Samuelson (1965) was the first to show the relevance of the Law of 
Iterated Expectations for security market analysis; LeRoy (1989) gives а 
lucid review of the argument. We discuss the point in detail in Chapter 7, 
but a brief summary may be helpful here. Suppose that a security price at 
ume t, H, can be written as the rational expectation of some “fundamental 
value" V*, conditional on information 4 available at time 1. Then we have 


Pees tty ШЕЛ ЖЕЕ EI (1.5.1) 
The same equation holds one period ahead, so 


Pay = E(V* | fail = ESQ. (1.5.2) 


But then the expectation of the change in the price over the next period is 
ED) = 5] = EdE f V- ЕДИ) = 0, (1.5.3) 


because J C Лат, so E,[E;, 4] VIN = ELV") by the Law of Iterated Expecta- 
tions. Thus realized changes in prices are unforecastable given information 
in the set /. 


1.5.2 Is Market Efficiency testable? 


Although the empirical methodology summarized here is well-established, 
there are some serious difficulties in interpreting its results. First, any test of 
efficiency must assume an equilibrium model that defines normal security 
returns, If efficiency is rejected, this could be because the market is truly 
inefficient or because an incorrect equilibrium model has been assumed. 
"This joint hypothesis problem means that market efficiency as such can never 
be rejected. 

Second, perfect efficiency is an unrealistic benchmark that is unlikely 
to hold in practice. Even in theory, as Grossman and Stiglitz (1980) have 
shown, abnormal returns will exist if there are costs of gathering and pro- 
cessing information. ‘These returns are necessary to compensate investors 
for their information-gathering and information-processing expenses, and 
arc no longer abnormal when these expenses are properly accounted for. 
In a large and liquid market, information costs arc likely to justify only small 
abuormal returns, but it is difficult to say haw small, even if such costs could 
be measured precisely. 

The notion of relative cfficiency—the efficiency of one market measured 
against another, e.g., the New York Stock Exchange vs. the Paris Bourse, fu- 
tures markets vs. spot markets, or auction vs. dealer markets—may be amore 
useful concept than the all-or-nothing view taken by much of the traditional 
marketefficiency literature. The advantages of relative efficiency over ab- 
solute efficiency are easy to see by way of an analogy. Physical systems are 
often given an efficiency rating based on the relative proportion of energy 
or fuel converted to useful work. Therefore, a piston engine may be rated 
at 60% efficiency, meaning that on average 60% of the energy contained in 
the engine's fuel is used to turn the crankshaft, with the remaining 40% lost 
to other forms of work such as heat, light, or noise. 

Few engineers would ever consider performing a statistical test to deter- 
mine whether or not a given engine is perfecdy efficient—such an engine 
exists only in the idealized frictionless world ofthe imagination, But measur- 
ing relative efficiency—relative to the frictionless ideal—is commonplace. 
Indeed, we have come to expect such measurements for many household 
products: air conditioners, hot water heaters, refrigerators, ete. Similarly, 
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market efficiency is an idealization that is cconomically unrealizable, but 
that serves as a useful benchmark for measuring relative efficiency. 

For these reasons, in this book we do not take a stand on market eff- 
ciency itself, but focus instead on the statistical methods that can be used 
to test the joint hypothesis of market efficiency and market equilibrium. 
Although many of the techniques covered in these pages are central to the 
inarketefficiency debate—tests of variance bounds, Euler equations, the 
CAPM and the APT—we fcel that they can be morc profitably applied to 
measuring efficiency rather than to testing it. And if some markets turn 
out to be particularly inefficient, the diligent reader of this text will be well- 


prepared to take advantage of the opportunity. | 
i 


The Predictability of Asset Returns 


ONE OF THE EARLIEST and most enduring questions of financial economet- 
rics is whether Gnancial asset prices are forecastable. Perhaps because of 
the obvious analogy between financial investments and games of chance, 
mathematical models of asset prices have an unusually rich history that pre- 
dates virtually every other aspect of economic analysis. The fact that many 
prominent mathematicians and scientists have applied their considerable 
skills to forecasting financial securities prices is a testament to the fascination 
and the challenges of this problem. Indeed, modern financial economics is 
firmly rooted in early attempts to "beat the market," an endeavor that is still 
of current interest, discussed and debated in journal articles, conferences, 
and at cocktail parties! 

In this chapter, we consider the problem of forecasting future price 
changes, using only past price changes to construct our forecasts. Although 
restricting our forecasts to be functions of past price changes may seem too 
restrictive to be of any interest—after all, investors are constantly bombarded 
with vast quantities of diverse information—nevertheless, even as simple a 
problem as this can yield surprisingly rich insights into the behavior of asset 
prices. We shallsee that the martingale and the random walk, two of the most 
important ideas in probability theory and financial economics, grew out of 
this relatively elementary exercise. Moreover, despite the fact that we shall 
present more sophisticated models of asset prices in Chapters 4-9, where 
additional economic variables are used to construct forecasts, whether fu- 
lure price changes can be predicted by past price changes alone is still a 
subject of controversy and empirical investigation. 

In Section 2.1 we review the various versions of the random walk hy- 
pothesis and develop tests for each of these versions in Sections 2.2-2.4. 
Long-horizon returns play a special role in detecung certain violations of 
the random walk and we explore some of their advantages and disadvan- 
tages in Section 2.5. Focusing on long-horizon returns leads naturally to 
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the notion of long-range dependence, and a test for this phenomenon is 
presented in Section 2.6. For completeness, we provide a brief discussion of 
tests for unit roots, which are sometimes confused with tests of the random 
walk. In Section 2.8 we present several empirical illustrations that document 
important departures from the random walk hypothesis for recent US stock 
market data. 


2.1 The Random Walk Hypotheses 


A useful way to organize the various versions of the random walk and mar- 
tingale models that we shall present below is to consider the various kinds 
of dependence that can exist between an asset's returns 7; and 744 at two 
dates tand t+ k. To do this, define the random variables f(r) and £74۸) 
where /) and g(-) are two arbitrary functions, and consider the situations 
in which 


C gond = 0 (2.1.1) 


for all (and for А50. For appropriately chosen /(-) and gC), virtually all 
versions of the random walk and martingale hypotheses are captured by 
(2.1.1), which may be interpreted as an orthogonality condition. 

For example, if /) and g) are restricted to be arbitrary linear (une 
tions, then (2.1.1) implies that returns are serially uncorrelated, correspond- 
ing to the Random Walk 3 model described in Section 2.1.3 below. Alterna- 
lively, if /(-) is unrestricted but g(-) is restricted to be linear, then (2.1.1) is 
equivalent to the martingale hypothesis described in Section 2.1. Finally, if 
(2.1.1) holds for all functions /С) and g(-), this implies that returns are mu- 
tually independent, corresponding to the Random Walk Land Random Walk 2 
models discussed in Sections 2.1.1 and 2.1.2, respectively. This classification 
is summarized in Table 2.1. 

Although there are several other ways to characterize the various ran- 
dom walk and martingale models, condition (2.1.1) and Table 2.1 are partic- 
ularly relevant for economic hypotheses since almost all equilibrium asset 
pricing models can be reduced to a set of orthogonality conditions. This 
interpretation is explored extensively in Chapters 8 and 12. 


The Martingale Madel 

Perhaps the earliest model of financial asset prices was the martingale modet, 
whose origin lies in the history of games of chance and the birth of prob- 
ability theory. The prominent kalian mathematician Girolamo Cardano 
proposed an clementary theory of gambling in his 1565 manuseript Liber de 


Table 2.1. Classification of random walk and martingale hypotheses. 


Cost f(r). ge = 0 К, | 
Vg(-) Linear vel) 
Uncorrelated Increments, 
Random Walk 3: 
Г), Vf C) Linear — 
Proj[r.4ln] = 4 
— 
Marüngale/Fair Game: | Independent Increments, Random 
Walks 1 and 2: 
fo. Vf Ет] = и 
ратат) e раќ Tek) 


*Proj[y | x]" denotes the linear projection of y onto x, and pdf denotes the probability density function of its 
argument. 
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Ludo Aleae (The Book of Games of Chance), in which he wrote:! 


The most fundamental principle of all in gambling is simply equal con 
ditions, e.g., of opponents, of bystanders, of money, of situation, . the 
dice box, and of the die itself. To the extent to which you depart trom 
that equality, if it is in your opponent's favour, you are a fool, and if in 
your own, you are unjust. 


This passage clearly contains the notion Of a fair game, u game which is neither 
in your favor nor your opponent's, and this is the essence of a martingale, а 
stochastic process A] which satisfies the following condition 


ЕР | P. II. . .] = р, (2.1.9) 


ог, equivalently, 
EP P, p p. %-1,...] = 0. (2.1.3) 


If P, represents one's cumulative winnings or wealth at date t from playing 
some game of chance each period, then a fair game is one for which the 
ex ycted wealth next Period is simply equal to this period's wealth (sce 
(2.1.2)), conditioned on the history of the Бате. Alternatively, a game is fair 
if tlle expected incremental winnings at any stage is zero when conditioned 
on the history of the game (see (2.1.3)). 

If P, is taken to be an aSScUS price at date f, then the martingale hypoth- 
esis states that tomorrow's price is expected to be equal to today's price, 
given the asset's entire price history, Alternatively, the asset's expected price 
charige is zero when conditioned on the asset's price history; hence its price 
is juft as likely to rise as it is to fall. From a forecasting perspective, the 
martingale hypothesis implies that the “best” forecast of tomorrow's price is 
н today's price, where “best” means minimal mean-squared error (see 
Chapter 7). 

mother aspect of the marungale hypothesis is that nonoverlapping 
Price changes are uncorrelated at all leads and lags, which implies the in- 
effectiveness of all linear forecasting rules for future price changes based 
on historical prices alone. The fact that so sweeping an implication could 
i from as simple a model as (2.1.2) foreshadows the important role that 
the martingale hypothesis will play in the modeling of asset price dynamics 
(see the discussion below and Chapter 7). 

In fact, the martingale was long considered to bea necessary condition 
for an efficient asset market, one in which the information contained in past 
prices is instantly, fully, and perpetually reflected in the assets current price.? 
Ifthe market is efficient, then it should not be possible to profit by trading on 


! See Hald (1990, Chapter 4) foi further details, 
?See Samuelson (1965, 1972, 1973). Roberts (1967) calls the martingale hypothesis weak- 
Jorm market efficiency, He also defines an asset market to be semistong-form and shonefurm 


2.1. The Random Walk Hypotheses : 31 


the information contained in the asset's price history; hence the conditional 
expectation of future price changes, conditional on the price history, cannot 
be cither posiiise or negative (if shortsales are feasible) and therefore must 
be zero. This notion of efficiency has a wonderfully counterintuitive and 
scemingly contradictory flavor toit: The more efficient the market, the more 
random is the sequence of price changes generated by the market, and the 
most efficient market of all is one in which price changes are completely 
random and unpredictable. 

However, one of the central tenets of modern financial economics is the 
necessity of some trade-off between risk and expected return, and although 
the martingale hypothesis places a restriction оп expected returns, it does 
not account for risk in any way. In particular, if an asset's expected price 
change is positive, it may be the reward necessary to attract investors to 
hold the asset and bear its associated risks. Therefore, despite the intuitive 
appeal chat the fair-game interpretation might have, it has been shown that 
the martingale property is neither a necessary nor a sufficient condition for 
rationally determined asset prices (see, for example, Leroy [1973], Lucas 
[1978], and Chapter 8). 

Nevertheless, the martingale has become a powerful tool in probability 
and statistics and also has important applications in modern theories of as- 
set prices. For example, once asset returns are properly adjusted for risk, 
the martingale property does hold (see Lucas [1978], Cox and Ross [1976], 
Harrison and Kreps [1979]). In particular, we shall see in Chapter 8 that 
marginal-utility-weighted prices do follow martingales under quite general 
conditions. This risk-adjusted mai tingale property has led toa veritable revo- 
lution in the pricing of complex financial instruments such as options, swaps, 
and other derivative securities (see Chapters 9, 12, and Merton [1990], for 
example). Moreover, the martingale led to the development of a closely re- 
lated model that has now become an integral part of virtually every scientific 
discipline concerned with dynamics: the random walk hypothesis. 


2.1.1 The Random Walk 1: HD Increments 


Perhaps the simplest version of the random walk hypothesis is the inde- 
pendently and identically distributed (IID) increments case in which the 
dynamics of (P,] are given by the following equation: 


P = + рр +e, €, ~ HDO, т?) (2.1.4) 
where pu isthe expected price change or drift, and, c?) denotes that €, is 


independently and identically distributed with mean 0 and yariance o?, The 


ellicient if the conditional expectation of future price changes is zero, conditioned on ail 
available public information, and all available public and private information, respectively, 
See Chapter 1 tor further discussion of these concepts. 


independence of the increments {e} implies that the random walk is also a 
fair game, but in a much stronger sense than the martingale: Independence 
implies not only that increments are uncorrelated, but that any nonlinear 
functions of the increments are also uncorrelated. We shall call this the 
Random Walk 1 model or RWI. 

To develop some intuition for RWI, consider its conditional mean and 
variance at date 7, conditional on some initial value at date 0: 


EJ | Pol = Pt (2.1.5) 


“Р | р = ө? (2.1.6) 


which follows from recursive substitution of lagged Pj in (2.1.4) and the HD 
increments assumption. From (2.1.5) and (2.1.6) it is apparent that the 
random walk is nonstationary and that its conditional mean and variance 
are both linear in time. These implications also hold for the two other forms 
of the random walk hypothesis (RW2 and RW3) described below. 

Perhaps the most common distributional assumption for the innova- 
tions or increments €, is normality. If the /s are HD N (0, 07), then (2.1.4) 
is equivalent to an. arithmetic Brownian motion, sampled at regularly spaced 
unit intervals (see Section ӘЛ in Chapter 9). This distributional assump- 
tion simplifies many of the calculations surrounding the random walk, but 
suffers from the same problem that afflicts normally distributed returns: 
violation of limited liability. If the conditional distribution of P, is normal, 
then there will always be a positive probability that Р, «0. 

To avoid violating limited liability, we may use the same device as in 
Section 1.4.2, namely, to assert that the natural logarithm of prices р, = 
log Р, follows a random walk with normally distributed increments; hence 


hic nc peace. e, ПОМО, G7). (2.1.7) 


This implies that continnously compounded returns are HD normal variates 
with mean j£ and variance o, which yields the lognormal model of Bache- 
Her (1900) and Einstein (1905). We shall return to this in Section 0.1 of 
Chapter 9. 


2.1.2 The Random Walk 2: Independent Increments 


Despite the elegance and simplicity of RWI, the assumption of identically 
distributed increments is not plausible for financial asset prices over long 
time spans. For example, over the two-hundred-year history of the New York 
Stock Exchange, there have been countess changes in the economic, so- 
cial, technological, institutional; and regulatory environment in which stock 
prices are determined. The assertion that the probability law of daily stock 


returns has remained the same over this two-hundred-year period is simply 
implausible. Therefore, we relax the assumptions of RWI to include pro- 
cesses with independent but not identically distributed (INID) increments, 
and we shall call this the Random Walk 2 model or RW2. RW2 clearly contains 
RW! asa special case, but also contains considerably more general price pro- 
cesses. For example, RW2 allows for unconditional heteroskedasticity in the 
/S, a particularly useful feature given the time-variation in volatility of many 
financial asset return series (see Section 12.2 in Chapter 12). b 

Although RW2 is weaker than RW! (sec Table 2.1), it still retains the 
most interesting economic property of the IID random walk: Any arbitrary 
transformation of future price increments is unforecastable using any arbi- 
trary transformation of past price increments. 


2.1.3 The Random Walk 3: Uncorrelated Increments 


An even more general version of the random walk hypothesis—the one most 
often tested in the recent empirical literature—may be obtained by relaxing 
the independence assumption of RW2 to include processes with dependent 
but uncorrelated increments. This is the weakest form of the random walk 
hypothesis, which we shall refer to as the Random Walk 3 mode} or RW3, 
and contains RWI and КМ2 as special cases. A simple example of a process 
that satisfies the assumptions of RW3 but not of RWI or RW2 is any process 
for which Cov[e,, €x) = 0 for all k * 0, but where Cov(e?, e: Æ 0 for 
some k £ 0. Such a process has uncorrelated increments, but is clearly not 


independent since its squared increments are correlated (see Section 12.2 
in Chapter 12 for specific examples). 


2.2 Tests of Random Walk 1: IID Increments’ 


Despite the fact that RW] is implausible from a prion theoretical considera- 
tions, nevertheless tests of RWI provide a great deal of intuition about the 
behavior of the random walk. For example, we shall see in Section 2.2.2 that 
the drift of a random walk can sometimes be misinterpreted as predictabil- 
ity if not properly accounted for, Before turning to those issues, we begin 


with a brief review of traditional statistical tests for the IID assumptions in 
Section 2.2.1. 


2.2.1 Traditional Statistical Tests 


Since the assumptions of IID are so central to classical statistical inference, it | 
should corne as no surprise that tests for these two assumptions have a long 
and illustrious history in statistics, with considerably broader applications | 
than to the random walk. Because of their breadth and ubiquity, it is virtually | 
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impossible to catalog all tests of IID in any systematic fashion, and we shall 
mention only a few of the most well-known tesis. 

Since IID are properties of random variables that are not specific tia 
particular parametric family of distributions, many of these tests fall under 
the rubric of nonparametric tests. Some examples are the Spearman rank 
correlation test, Spearman's footrule test, the Kendall r correlation test, 
and other tests based on linear combinations of ranks or R-statistics (see 
Randles and Wolfe [1979] and Serfling (1980]). By using information con- 
tained solely in the ranks of the observations, it is possible to develop tests 
of AD that are robust across parametric families and invariant to changes in 
units of measurement. Exact sampling theories for such statistics are gener- 
ally available but cumbersome, involving transformations of the (discrete) 
uniform distribution over the set of permutations of the ranks. However, for 
most of these statistics, normal asymptotic approximations to the sampling 
distributions have been developed (see Serfling (1980]). 

| More recent techniques based on the empirical distribution function 
of the data have also been used to construct tests of IID. These tests of- 
ten'require slightly stronger assumptions on the joint and marginal distri- 
bg functions of the date-generating: process; hence they fall into the 
class of semiparametric tests. Typically, such tests form a direct compari- 
sonibetween the joint and marginal empirical distribution functions or an 
indirect comparison using the quantiles of the two. For these test statis- 
tics, "exact sampling theories are generally unavailable, and we must rely on 
asymptotic approximations to perform the tests (sce Shorack and Wellner 
[1986]). 

Under parametric assumptions, tests of HD are generally casier to con- 
struct. For example, to test for independence among k vectors which are 
joindy normally distributed, several statistics may be used: the likelihood 
ratio statistic, the canonical correlation, eigenvalues of the covariance ma- 
trices, etc, (see Muirhead [1983]). Of course, the tractability of such tests 
must be traded off against their dependence on specific parametric assump- 
tions. Although these tests are often more powerful than their nonparamet- 
ric counterparts, even small departures from the hypothesized parametric 
family can lead to large differences between the actual and nominal sizes of 
the tests in finite samples. 


2.2.2 Sequences and Reversals, and Runs 


The early tests of the random walk hypothesis were largely tests of RWI and 
RW2. Although they are now primarily of historical interest, nevertheless 
we can learn a great deal about the propertics of the random walk from such 
tests. Moreover, several recently developed econometric tools rely heavily 
on RWI (see, for example, Sections 2.5 and 2.6), hence a discussion of these 
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i 


tests also provides us with an opportunity to develop some machinery that. | 


we shall require later. 


Sequences and Reversals 
We begin with the logarithmic version of RWI or geometric Brownian mo- 


tion in which the log price process p, is assumed to follow an HD random 
walk without drift: 


p= pate. e, ~ HDO, o?) (2.2.1) 


and denote by f, the following random variable: 


1 il n = phopa 0 09:9) 
0 if n = p~p- < 0. 
Much like the classical Bernoulli coinaoss, J, indicates whether the date-t 
continuously compounded return r, is positive or negative. In fact, tlie coin- 
tossing analogy is quite appropriate as many of the original tests of RWI were 
based on simple coin-tossing probabilities. 

One of the first tests of RWI was proposed by Cowles and Jones (1937) 
and consists of a comparison of the frequency of sequences and reversals in his- 
torical stock returns, where the former are pairs of consecutive returns with 
the same sign, and the latter are pairs of consecutive returns with opposite 
signs. Specifically, given a sample of n+l returns n, ..., 7,44, the number 
of sequences М, and reversals М, may be expressed as simple functions of 
the 4's: 


N, 


i 
М 
5 


Ү, = I. fay F(1 — КЕ Ii) (2.2.3) 
N, = n-N,. (2.2.4) 


If log prices follow a driftess IID random walk (2.2.1), and if we add the 
further restriction that the distribution of the increments e, is symmetric, 
then whether r, is positive or negative should be equally likely, a fair coin-toss 
with probability one-half of either outcome. This implies that for any pair of 
consecutive returns, a sequence and a reversal are equally probable; hence 
the Cowles-Jones ratio CJ = N,/N, should be approximately equal to one. 
More formally, this ratio may be interpreted as a consistent estimator of the 
ratio CJ of the probability л, of a sequence to the probability of a reversal 
1 — x, since: 


| 
i 


H 
i 


J 


P, 


where ">" denotes convergence in probability. The fact that this ratio 
exceeded one for many historical stock returns series led Cowles and Jones 
(1937) to conclude that this “represents conclusive evidence of structure in 
stock prices.”* 

However, the assumption of a zero drift is critical in determining the 
value of CJ. In particular, C] will exceed one for an HD random walk with 
drift, since a drift—cither positive or negative—clearly makes sequences 
more likely than reversals. To sce this, suppose that log prices follow a 
normal random walk with drift: 


pr = + Prater e, ~ NO, a°). 


Then the indicator variable J, is no longer a fair coin-toss but is biased in 
the direction of the drift, i. e., 


| with probability x 
l = 1 . (2.2.5) 
0 with probability Т — л, 


where 


л = Pr, > 0) = e(£). (2.2.6) 
а 


Ifthe drift д is positive then z > $, and if it is negative then л < j. Under 
this more general specification, the ratio of x, to 1 — л, is given by 


oz m4 (1-2)? 

y= 2л(1-л) 
As long as the drift is nonzero, it will always be the case that sequences аге 
more likely than reversals, simply because a nonzero drift induces a trend 
in the process. It is only for the "fair-game" case of x = 2 that CJ achicves 
its lower bound of one. 

To see how large an effect a nonzero drift might have on CJ, suppose 
that y = 0.08 anda = 0.21, values which correspond roughly to annual US 
stock returns indexes over the last half-century. This yields the following 
estimate of yr: 


А 0.08 

л = Ф asi) = 0.6484 
0.21 

î = R° Ry = 0.5440 

Q = 119, 


Mn a later study, Cowles (1960) corects for biases in time-averaged price data and sull 
finds C] ratios in excess ol one; However, his conclusion is somewhat more guarded: “. . . whe 
ош v ite analyses have disclosed it tendency towards persistence in stock price movements, 
in no case is this sullicient to provide more than negligible profits after payment of brokerage 
сөм». 


which is close to the valuc of 1.17 that Cowles and Jones (1937, Table il) 
report for the annual returns of an index of railroad stock dae from ms 
to 1935. Is the difference statistically significant? 

To perform a formal comparison of the two values 1.19 and 1.17, we 
require a sampling theory for the estimator G. Such a theory may be ob- 
tained by noting from (2.2.3) that the estimator N, is a binomial random 

variable, i. e., the sum of 2 Bernoulli random variables Y, where 


Т 1 with probability zr, = л + (1 — r); 
Jo with probability 1 — r, 


hence we may approximate the distribution of N, for large n by a normal 
distribution with mean E(N,] = nz, and variance Var[N,]. Because each 
pair of adjacent Y,'s will be dependent,‘ the variance of N, is not пл, (1 -n) 
the usual expression for the variance of a binomial random variable—but is 
instead : 
4 


Var[ V.] 


nr,(l ,) + 2nCov[Y,, У,+1] 


1 


nn, (I- 9 + 2 (r? + (1 л) e (227) 


Applying a first-order Taylor approximation or the delta method (see Sec- 
tion A.4 of the Appendix) to С] = N,/(n — N.) using the normal asymptotic 
approximation for the distribution of N, then yields 


- 3 E NE 
qi,( m, Bowe). (2.9.8) 


l-r, n(l ~ 7) 


where indicates that the distributional relation is asymptotic. Since the 
Cowles and Jones (1937) estimate of 1.17 yields 7, = 0.5392 and л = 0.6399, 
with a sample size n of 99 returns, (2.2.8) implies that the approximate 
standard error of the 1.17 estimate is 0.2537. Therefore, the estimate 1.17 is 
not statistically significantly different from 1.19. Moreover, under the nult 
hypothesis л = P С] has a mean of one and a standard deviation of 0.2010; 
hence neither 1.17 or 1.19 is statistically distinguishable from one. This 
provides little evidence against the random walk hypothesis. 
On the other hand, suppose the random walk hypothesis were false— 
would this be detectable by the CJ statistic? To sec how departures from the 
random walk might affect the ratio С], let the indicator J, be the following 


Tin fact, Y, is a two-state Markov chain with probabilities Pr(Y, = 1| Yi = 1) = o * 
A = pp and Pr(Y, = 0| Yi, = 0) = 1/2. 
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two-state Markov chain: 


е lui 
1 0 
! | fl-@ a png 
(2.2.9 
| hog ( Be The " ! 


where а denotes the conditional probability that 544 is negative, conditional 
on a positive n, and P denotes the conditional probability that 744 4 is positive, 
к ona negative n. Wa = = f, this reduces to the case d 
above (sett = a): the HD random walk with drift. As long аза 4 1 — 

1, (hg nce y) will be serially correlated, violating RWI. In this case, 55 
theoretical value of the ratio C] is given by 


(l-a)p+ (1 Во 
20% 


whic T can take on any nonnepative real value, as illustrated by the following 
table. 
| В 
E 0.10. 0.20 0.30 0.0 0.50 0.60 0.70 0.80 0.90 1.00 
0.10 79.00 6.50 5.67 5.25 5.00 4.83 471 4.63 4.50 4.50 
6.20 6.50% 4.00 347 2.75 2.50 2.33 2.21 2.13 2.06 2.00 
0.30 [5.67 3.17 2.33. 102 1.67 1.50 1.38 1.29 1.22 1.7 
0.40 J 5.25 2.75 1.92. 1.50 1.25 1.08 0.96 0.87 ORL 0.75 
0.50 | 5.00 2.50 1.67 1.25 1.00 0.83 0.71 0.63 0.56 0.50 
60.60 | 4.83 2.33 1.50 1.08 0.83 0.67. 0.55 0.46 0.39 0.53 
6.70 ] 471 2.21 1.38 0.96 0.71 0.55 0.43. 0.34 0.27 0.21 
0.80 | 4.63 2.13 1.29 0.87 0.63 0.46 0.34 0.25 0.18 9.12 
i 
| 


| 
| 
| С] = : (2.2.10) 


0.90 | 4.56 2.06 1.22 0.81 0.56 (0.30 0.27 0.18 P 0.06 
1.00 4.50. 2.00 1,17. 0.75. 0.50 0.33 0.21 0.12. 0.06 000 


As a and Û both approach one, the likelihood of reversals increases and 
hence С] approaches 0. As either œ or B approaches zero, the likelihood 
of sequences increases and C] increases without bound. In such cases, €] 
is clearly a reasonable indicator of departures from RWI. However, note 
that there 6 xist combinations of (a, f) for which 031-6 and CJz1, eg, 
(a, B) =). 2) 3); hence the Cj statistic cannot distinguish these cases from 
RWI (scc Proble m 2.3 for further discussion). 


Runs 

Another common test for RWI is the rans fest, in which the number of 
sequences of consecutive positive and negative returns, or runs, is tabulated 
and compared against its sanpling distribution under the random walk 
hypothesis. For example, using the indicator variable / defined in (2.2.2), 
a particular sequence of 10 returns may be represented by 1001110100, 
containing three eus of 1s Cof lengihi l. Z. and respectively) and three runs 
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of Os (of length 2, L and 2, respectively), thus six runs in total. In contrast, 
the sequence 000001 LLEI contains the same number of Os and 15, but only 
2 runs. By comparing the number of runs in the data with the expected 
number of runs under RWI, a test of the HD random walk hypothesis may 
be constructed. To perform the test, we require the sampling distribution 
of the total number of runs Nun in a sample of n. Mood (1940) was the first 
to provide a comprehensive analysis of runs, and we shall provide a brief 
summary of his most general results here. 

Suppose that each of n HD observations takes on one of q possible 
values with probability r, i = 1,..., (hence L. r, = 1). In the case of 
the indicator variable /, defined in (2.2.2), 4 is equal to 2; we shall return 
to this special case below. Denote by Muns(i) the total number of runs 
of type i (of any length), i = I... . % hence the total number of runs 
Nig Y. Nun Ci). Using combinatorial arguments and the properties of 
the multinomial distribution, Mood (1940) derives the discrete distribution 
of Naas (H from which he calculates the following moments: 


EI Vun (i)]! = un, =r) +r? (2.2.11) 


Varl Nas G)] - An, + бл? — 327) 


1 


＋ 103 8x, Ln) (2.2.12) 


Ц 


Cov[ Mum Ci), Nun (j)] U — 2m, — 2л, + Zn, mj) 
— nn (27 + 2n, — 5mm). (2.2.13) 


Morcover, Mood (1940) shows that the distribution of the number of runs 
converges to a normal distribution asymptotically when properly normal- 
ized. lu particular, we have 


Nun (i) = nn; — .) — n; 


Jn 
VO. I- — 32; 1 (2.2.14) 


x = 


ШЕ 


Соу[х;, x] r, ;( - 2л; — 2л, + Зл, u,) (2.2.15) 
Ма = п id DU т?) 


Jn 


2 
k k 
A Vo. Ул e =D] |. (2.2.16) 
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ty 


where “=" indicates that the equality holds asymptotically, Tests of RWI 
may then be performed using the asymptotic approximations (2.2.14) or 


Table 2.2. Expected runs for a random walk with drift p. 


n H m Ef Man] 


1.000 0 0.500 500.5 
1.000 2 9.538 497.6 
1.000 4 0.576 489.1 
1.000 6 0.612 475.2 
1,000 8 0.648 456.5 
1,000. 10 90.683 433.6 
1.000 12 0.716 407.2 
1,000 1 0.748 378.1 
1,000 16 0.777 347.3 
1.000 18 0.804 315.5 
1.000 20 0.830 283.5 


Expected total number of runs in a sample of n independent Bernoulli trials representing рохе 
itive/negative continuously compounded returns for a Gaussian geometric Brownian motion 
with drift ji = 0%. .. . 20% and standard deviation a = 21%. 


(2.2.16), aud the probabilities r, may be estimated directly from the data as 
the ratios л, = nj/ n, where n, is of the number of runs in the sample of n 
that are the ith type; thus n = $7, n. 

To develop some sense of the behavior of the total number of runs, 
consider the Bernoulli case k = 2 corresponding to the indicator variable 
I, defined in (2.2.2) of Section 2.2.2 where r denotes the probability that 
J, = 1. In this case, the expected total number of runs is 


EI Mn! = 2na ~r) + zn? + (1 — n). (2.2.17) 


Observe that for any н > 1, (2.2.17) isa globally concave quadratic function 
in on [0, 1] which attains a maximum value of (n + 1)/2 atm = 2 There- 
fore, a driftless random walk maximizes the expected total number of runs 
for any fixed sample size n or, alternatively, the presence of a drift of either 
sign will decrease the expected total number of runs. 

To see the sensitivity of Ef Nan] with respect to the drift, in Table 2.2 
we report the expected total number of runs for a sample of n = 1.000 
observations for a geometric random walk with normally distributed incre- 
ments, drift ji = N . . . . 9076, and standard deviation о = 21% (which is 
calibrated to match annual US stock index returns); hence m = (ia). 
From Table 2.2 we see that as the drift increases, the expected total number 
of runs declines considerably, from 500.5 for zero-drift to 283.5 for à 20% 
drift. However, all of these values are still consistent with the random walk 
hypothesis, 
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To perform a test for the random walk in the Bernoulli case, we may 
calculate the following statistic: 


Nun = 2nz(] — ) 


pesi EMI ES ЧАА 0,1 
2 //nn(l — n)[1 -In (I- x)] ND 


and perform the usual test of significance. A slight adjustment to this statis- 
tic is often made to account for the fact that while the normal approximation 
yields different probabilities for realizations in the interval [ Nruns, Nuns +1), 
the exact probabilities are constant over this interval since Nuns is integer- 
valued. Therefore, a continuity correction is made in which the z-statistic is eval- 
uated at the midpoint of the interval (see Wallis and Roberts [1956]); thus 


Nin + 2 = 2nn(l — л) 
~ д9,/тл(1—л)[1—93л(1—л)) 


Other aspects of runs have also been used to test the Пр random walk, 
such as the distribution of runs by length and by sign. Indeed, Mood's 
(1940) seminal paper provides an exhaustive catalog of the properties of 
runs, including exact marginal and joint distributions, factorial moments, 
centered moments, and asymptotic approximations. An excellent summary 
of these results, along with a collection of related combinatorial problems 
in probability and statistics is contained in David and Barton (1962). Fama 
(1965) presents an extensive empirical analysis of runs for US daily, four-day, 
nine-day, and sixteen- day stock returns from 1956 to 1962, and concludes 
that, “...there, is no evidence of important dependence from either an 
investment or a statistical point of view.” 

More recent advances in the analysis of Markov chains have generalized 
the theory of runs to non-I[D sequences, and by recasting patterns such 
as a run as elements of a permutation group, probabilities of very complex 
patterns may now be evaluated explicitly using the first-passage or hitting time 
of a random process defined on the permutation group. For these cork 


recent results, see Aldous (1989), Aldous and Diaconis (1986), and Diaconi 
(1988). 


~ М0, D. 


2.3 Tests of Random Walk 2: Independent Increments | 


The restriction of identical distributions is clearly implausible, especially 
when applied to financial data that span several decades. However, testing 
for independence without assuming identical distributions is quite difficult; . 
particularly for time series data. If we place no restrictions on how the 
marginal distributions of the data can vary through time, it becomes virtually 
impossible to conduct statistical inference since the sampling distributions 
of even the most elementary statistics cannot be derived. 


42 i 2. The Predictability of Asset Returns 


Some of the nonparametric methods mentioned in Section 2.2.1 such as 
rank correlations do test for independence without also requiring identical 
distributions, but the number of distinct marginal distributions is typically 
a finite and small number. For example, a test of independence between 
IQ jcores and academic performance involves two distinct marginal dis- 
tributions: one for IQ scores and the other for academic performance. 
Multiple observations are drawn from each marginal disiribution and vari- 
ous nonparametric tests can be designed to check whether the product of 
the marginal distributions equals the joint distribution of the paired ob- 
servations, Such an approach obviously cannot succeed if we hypothesize 
a unique marginal distribution for cach observation of IQ and academic 
performance, 


Nevertheless, there are two lines of empirical research that can be 
viewed as a kind of “economic” test of RW2: filter rules, and technical analysts. 
Although neither of these approaches makes much use of formal statistical 
inference, both have captured the interest of the financial community for 
practical reasons. This is not to say that statistical inference cannot be ap- 
plied to these modes of analysis, but rather that the standards of evidence in 
this literature have evolved along very different paths. Therefore, we shall 
present only a cursory review of these techniques. 


2.3.1 Filter Rules 


To test RW2, Alexander (1961, 1964) applied a filter rule in which au asset 
is purchased when its price increases by x96, and (short)sold when its price 
drops by x%. Such a rule is said to be an x% filter, and was proposed by 
Alexander (1961) for the following reasons: 


Suppose we tentatively assume the existence of trends in stock market 
prices but believe them to be masked by the jiggling of the market. We 
might filter out all movements smaller than a specified size and examine 
the remaining movements. 


The total return of this dynamic portfolio strategy is then taken to be a 
measure of the predictability in asset returns. A comparison of the total 
return to the return from а buy-and-hold strategy for the Dow Jones and 
Standard and Poor's industrial averages led Alexander to conclude that 
^. . there are trends in stock market prices....“ 

Fama (1965) and Fama and Blume (1966) presenta more detailed em- 
piric: | analysis of filter rules, correcting for dividends and trading costs, 
and conclude that such rules do not perform as well as the buy-and-hold 
strategy. In the absence of transactions costs, very small filters (1% in 
Alexander [1964] and between 0.5% and 1.5% in Fama and Blume [1966]) 
do yield superior returns, but because small filters generate considerably 
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more frequent trading, Fama and Blume (1966) show that even a 0.1% 
roundtrip transaction cost is enough to climinate the profits from such fil- 
ter rules. 


2.3.2 Technical Analysis 


As a measure of predictability, the filter rule has the advantage of practical 
relevance—-it is a specific and readily implementable trading strategy, and 
the metric of its success is total return, The filter rule is just one example of 
a much larger class of trading rules arising from technical analysis or charting. 
Technical analysis is an approach to investment management based on the 
behef that historical price series, trading volume, and other market statis- 
tics exhibit regulariies—often (but not always) in the form of geometric 
patterns such as double bottoms, head-and-shoulders, and support and resistance 
levels—that can be profitably exploited to extrapolate future price move- 
ments (see, for example, Edwards and Magee [1966] and Murphy [1986]). 
In the words of Edwards and Magee (1966): 


Technical analysis is the science of recording, usually in graphic form, 
the actual history of trading (price changes, volume of transactions, 
etc) in a certain stock or in “the averages" and then deducing from 
that pictured history the probable future trend. 


Historically, technical analysis has been the "black sheep" of the academic 
finance community. Regarded by many academics asa pursuit that lies some- 
where between astrology and voodoo, technical analysis has never enjoyed 
the same degree of acceptance that, for example, fundamental analysis has 
received. This state of affairs persists today, even though the distinction be- 
tween technical and fundamental analysis is becoming progressively fuzzier.? 

Perhaps some of the prejudice against technical analysis can be at- 
tributed to semantics. Because fundamental analysis is based on quantities 
familiar to most financial economists—for example, earnings, dividends, 
and other balance-sheet and income-stalement items—it possesses a natu- 
ral bridge to the academic literature. In contrast, the vocabulary of the 
technical analyst is completely foreign to the academic and often mystifying 
to the general public. Consider, for example, the following, which might 
be found in any recent academic finance journal: 


The magnitudes and decay pattern of the first twelve autocorrelations 
and the statistical significance of the Box-Pierce Q-statistic suggest the 
presence of a high-frequency predictable component in stock returns. 


For example, many technical analysts no longer base their forecasts solely on past prices 
and volume but also use earnings and dividend information and other "fundamental" data, 


and as many fundamental analysts now look at past price and volume patterns in addition to 


more traditional variables. 
^ 
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Contrast this with the statement: 


The presence of clearly identified support and resistance levels, coupled 
with a one-third retracement parameter when prices lie between them, 
suggests the presence of strong buying and selling opportunities in the 
near-term, 


Both statements have the same meaning: Using historical prices, one can 
predict future prices to some extent in the short run. But because the twa 
statements are so laden with jargon, the type of response they elicit depends 
very much on the individual rcading them. 

Despite the differeuces in jargon, recent ernpirical evidence suggests 
that technical analysis and morc traditional financial analysis may have much 
in common (see, in particular, Section 2.8), Recent studies by Blume, Easley, 
and O'Hara (1994), Brock, Lakonishok, and LeBaron (1992), Drown and 
Jennings (1989), LeBaron (1996), Neftci (1991), Pau (1991), Taylor and 
Allen (1992), and Treynor and Ferguson (1985) signal a growing interest in 
technical analysis among financial academics, and so it may become a more 
active research area in the near future. 


2.4 Tests of Random Walk 3: Uncorrelated Increments 


One of the most direct and intuitive tests of the random walk and martin- 
gale hypotheses for an individual time series is to check for serial correlation, 
correlation between two observations of the same series at different dates. 
Under the weakest version of the random walk, RW3, the increments or 
first-differences of the level of the random walk are uncorrelated at all leads 
and lags. Therefore, we may test RW3 by testing the null hypothesis that che 
autocorrelation coefficients of the first-differences at various lags are all zero. 

This seemingly simple approach is the basis for a surprisingly large va- 
riety of tests of the random walk, and we shall develop these tests in this 
chapter, For example, tests of the random walk may be based on the autocor- 
relation coefficients themselves (Section 2.4.1). More powerful tests may be 
constructed from the sum of squared autocorrelations (Section 2.4.2). Lin- 
car combinations of the autocorrelations may also have certain advantages 
in detecting particular departures from the random walk (Sections 2.4.3 and 
9,5). Therefore, we shall devote considerable attention to the properties of 
autocorrelation coefficients in the coming sections. 


2.4.1 Autocorrelation Coefficients 


The autocorrelation coefficient is a natural time-series extension of the well- 
known correlation cocllicient between two random variables x and у: 


Cov[x, y] 


Ма / Varlyl 


Corrx. v]. = (2.4.1) 
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Given a covariance-stationary time series {r,}, the kth order autocovariance 
and autocorrelation coefficients, y (k) and p(k), respectively, are definedgas® 


(۸) 


Ut 


Covl u. r44] (2.4.2) 


NT Covlu, fita] - Cov, tra) = y(k) (2.4.3) 
JNar[r] /Уаг[ +] Var II] y(0)' Uv 


where the second equality in (2.4.3) follows from the covariance-stationarity 
of [1]. Fora given sample (uli, autocovariance and autocorrelation coeffi- 
cients may be estimated in the natural way by replacing population moments 
with sample counterparts: 


1 T-k M 
700 Fe- Tr), O<sk<T (2.444) 
y(k) 
hd L— 2.4.5 
Atk) 500 2.4.5) 
1 T 
Tr = T у, f. І (2.4.6) 


The sampling theory for y (k) and 6(k) depends, of course, on the data- 
generating process for {n}. For example, if r, is a finite-order moving aver- 


age, 
M 
= Yn 81 -K · 


k=0 


where {e,} is an independent sequence with mean 0, variance a?, fourth 
moment qo‘, and finite sixth moment, then Fuller (1976, Theorem 6.3.5) 
shows that the vector of autocovariance coefficient estimators is asymptoti- 
cally multivariate normal: 


JÁT[$qQ)-v() P(D=y(D - P(m=y(m) C N. V. (2.4.7) 


where 


У = [ Ui ] 
w = (-93yGvQ-t $ [rO ye- 
+ y(é+j) 0-0 |. (2.4.8) 


"The requirement of covariance-stationarity i» primarily for notational convenience 


otherwise y (A) and p(k) may be functions of г as well as А, and may not even be well-defined if 
second moments are not finite. 
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Under the same assumptions, Fuller (1976, Corollary 6.3.5.1) shows that 
the asymptotic distribution of tlie vector of autocorrelation coefficient esti- 
mators is also multivariate normal: 


YT [ -- AAJ- ++» û(m~p(m) | ^ N.). (2.4.9) 


where 
1% = LO pEi) + e(t) (6-0) ~ 20) ple) pt 
| ё=-оо 


| — 2p(1) p(£) р(#— j) + 2p0(D pj) p*(0 ]. (2.4.10) 


For purposes of testing the random walk hypotheses in which all the pop- 
ulation autocovariances are zero, these asymptotic approximations reduce 
to simpler forms and more can be said of their finite-sample means and 
variances. In particular, if 1") satisfies RWI and has variance с? and sixth 
тотеп! proportional to gë, then 


Ш 


| , 
| б de TR NN (9.4.11) 


E[A(k)] ( 
M if K =€ 0 


C б k * 0 
ov(ó( ) (0) OCT?) otherwise. 


(2.4.12) 


From! (2.4.11) we see that under RWI, where o(k)=0 for all k>0, the sample 
autocorrelation coefficients 6(k) are negatively biased. This negative bias 
comes from the fact that the autocorrelation coefficient is a scaled sum of 
кшк of deviations of 7, [rom its mean, and if the mean is unknown 
it must be estimated, most commonly by the sample mean (2.4.6). But 
deviations from the sample mean sum to zero by construction; therefore 
positive deviations must eventually be followed by negative deviations on 
average and vice versa, and hence the expected value of cross-products of 
deviations is negative. 

For smaller samples this effect can be significant: The expected value 
of 6(1) for a sample size of 10 observations is —10%. Under RWI, Fuller 
(1976) proposes the following bias-corrected estimator 09:7 


: Т-к 
5⁰⁰ = p(k) + (rant ( 


1 - p*()). (2.4.13) 


Not that ACA) is not unbiased; the term ае: refers to the fact that 
Elo(0]z QUT). 
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With uniformly bounded sixth moments, he shows that the sample auto- 
correlation coellicients are asymptotically independent and normally dis- 
tributed with distribution: 


VT Ath) ~ МОЮ. 1) (2.4.14) 
T d 
=== pU) ~ NON. (2.4.15) 
via 


These results yield a variety of autocorrelation-based tests of the random 
walk hypothesis RWI. 

Lo and MacKinlay (1988), Richardson aud Smith (1994), and Romano 
and Thombs (1996) derive asymptotic approximations for sample autocor- 
relation coefficients under even weaker conditions—uncorrcelated weakly 
dependent observations—and these results may be used to construct tests 
of RW2 and RW3 (see Section 2.4.3 below). 


2.4.2 Portmanteau Statistics 


Since RWI implies that аЙ autocorrelations are zero, a simple test statistic 
of RWI that has power against many alternative hypotheses is the Q-statistic 
duc to Box and Pierce (1970): 


it 


Qu = TY pu. (2.4.16) 
k=l 


Under the RWI null hypothesis, and using (2.4.14), it is easy to see that 
Q, = T SL P^ (k) is asymptotically distributed as xv. Ljung and Box 
(1978) provide the following finite-sample correction which yields a better 
fit to the Xs for small sainple sizes: 


p*(k) 


=. 4.17 
7 1 (24.17) 


т 

Qu = 107+9) У 

k=l 

By summing the squared autocorrelations, the Box-Pierce Q-statistic is de- 

signed to detect departures from zero autocorrelations in cither direction 

and at all lags. Therefore, it has power against a broad range of alternative 

hypotheses to the random walk. However, selecting the number of auto- 

correlations m requires some care—if too few are used, the presence of 

higher-order autocorrelation may be missed; if too many are used, the test 

may not have much power due to insignificant higher-order autocorrela- 

tions. Therefore, while such a porimanteau statistic does have some appeal, 

better tests of the random walk hypotheses may be available when specific 

alternative hypotheses can be identified. We shall turn to such examples in 
the next sections. 


2. ine Viedictability of Asset. Returns 


2.4.3 Variance Ratios 


An important property of all three random walk hypotheses is that the vari- 
ance of random walk increments must be a linear function of the time 
interval.“ For example, under RWI for log prices where continuously com- 
pounded returns yE log P- log Р, | are HD, the variance of ray. j must 
he twice the variance of 5. Therefore, the plausibility of the random walk 
model may be checked by comparing the variance of n 5.4 to twice the 
variance of n? Of course, in practice these will not be numerically identical 
even if RWI were truc, but their ratio should be statistically indistinguishable 
from one. Therefore, to construct a statistical test of the random walk hv- 
pothesis using variance ratios, we require their sampling distribution under 
the random walk null hypothesis. 


Population Properties of Variance Ratios 

Before deriving such sampling distributions, we develop some intuition for 
the population values of the variance ratio statistic under various scenar- 
ios. Consider again the ratio of the variance of a two-period continuously 
compounded return n(2) ж n + ice the variance of à one period 
return io, and for tlie moment let us assume nothing about the time series 
of returns other than stationarity, Then this variance ratio, which we write 
as VR(2), reduces to: 


Var 5(2)] » Var», + 5.1] 


VRQ) = - = 
2 V. % Маге] 
= 2 Var( n] + 2Cov[n. =I! 
2 Varl] 
VR) = I+ pC), (2.4.18) 


where рО) is the first-order autocovrelation coefficient of returns [5]. For 
any stationary time series, the population value of the variance ratio statistic 
VR(2) is simply one plus the first-order autocorrelation coefficient. In par- 
ticular, under RWI idl the autocorrelations are zero, hence VR(2)= I in this 
case, as expected. 

In the presence of positive first-order autocorrelation, VR(2) will exceed 
onc. IF returns are positively autocorrelated, the variance of the sum of two 


his linearity property is more difficult to state in the case of RW2 and RWS because the 
mees of increments may vary through time; However, even in these cases the variance 
of the sum must equal the sum of the variances, and this is the linearity property which the 
inet ratio test exploits; We shall construct tests of all three hypotheses below. 


n 


"Mans studies have explored this property of the random walk hypothesis in devising em- 
pirical testsol predic abili; recent e une inchide Campbell and Mankiw (1987), Cochrane 
(n), Faust (19902), Lo and Mac Кота (1988), Poterba and Summers (O88), Кін хон 
(1993), and Richardson aud Stock СТМ), 
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one-period returns will be larger than the sum of the one-period return's 
variances; hence variances will grow faster than linearly. Alternatively, in 
the presence of negative first-order autocorrelation, the variance of the sum 
of two one-period returns will be smaller than the sum of the one-period 
return’s variances; hence variances will grow slower than linearly. 

For comparisons beyond one- and two-period returns, higher-order au- 
tocorrelations come into play. In particular, a similar calculation shows that 
the general q-period variance ratio statistic VR(q) satisfies the relation: 


444 
Var[r(q)] 1 ( =} 
VR(q) = ———— = 142 1— = } p(k), (2.4.19) 
7 q Varin] 3 q р | 
where n(k) = rnb: т and p(k) is the kth order autocorrelation 


ses PM " : У " m 
coefficient of {r}. This shows that VR(q) is a particular linear combination 


of the first A= autocorrelation coefficients of {r}, with linearly declining 
weights, | 

Under RWI, (2.4.19) shows that for all q, VR(q)=1 since in this case 
p(k)=0 for all kx 1. Moreover, even under RW? and ВМЗ, VR(q) must still 
сана one as long as the variances of r arc finite and the "average variance" 
ms ‚ Var[n]/ T converges to a finite positive number. But (2.4.19) is even 
more informative for alternatives to the random walk because it relates the 
behavior of VR(q) to the autocorrelation coefficients of {r} under such 
alternatives. For example, under an AR(1) alternative, т) = Gi + €, 
(2.4.19) implies that 


4-1 
VR) = +27) - و(‎ n 
k=l q 
2 фї ф—ф% | 
F 
1-6 |^ q q«1-9) 


Relations such as this are critical for constructing alternative hypotheses for 
which the variance ratio test has high and low power, and we shall return to 
this issue below. 


Sampling Distribution of VD(q) and VR(q) under RWI 

To construct a statistical test for ВМ we follow the exposition of Lo and 
MacKinlay (1988) and begin by stating the null hypothesis Hg under which 
the sampling distribution of the test statistics will be derived. ““ Let р, denote 
the log price process andry = pipi continuously compounded returns. 


"For alternative expositions see Campbell and Mankiw (1987), Cochrane (1988), Faust 
(1992), Poterba and Summers (ТОВА), Richardson (1993), and Richardson and Stock (1984), 
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Then the null hypothesis we consider in this section is!! 


Ho : hoc ud €, €, ПОМО, о?). 


Let our data consist of 2n+1 observations of log prices (. fj, , Pan), and 
consider the following estimators for u and ?: 


1 2" | 
D. om uL — Pye = — u — 2.4.20 
ji T 2 l-) = > (Pon = pr) ( ) 
l 2n 
a2 ^52 D D 
6% = — (Pe — fia — Д) (2.4.21) 
2n "m 
67 = T » (pox = bo 2 — оду? (2 4 22 
t 2n АШ б ` d 


Equations (2.4.20) and (2.4.21) are the usual sample mean and variance 
estimators. They are also the maximum-likelihood estimators of u and o? (see 
Section 9.3.2 in Chapter 9). The second estimator 6% of a? makes use of the 
random walk nature of p: Under RWI the mean and variance of increments 
are linear in the increment interval, hence the о? can be estimated by one- 
half the sample variance of the increments of even-numbered observations 
Un. D) u. .., pond. 

Under standard asymptotic theory, all three estimators are strongly con- 
sistent: Holding all other parameters constant, as the total number of obser- 
vations 2n increases without bound the estimators converge almost surely to 
their population values. In addition, itis well known that o2 and 67 possess 
the following normal limiting distributions (sce, for example, Stuart and 


Ord [1987]): 
| 


| n (% o X м, 904) (2.4.23) 
1 
| u (G 5 NO, 400). (2.4.24) 
| 


Although it may readily be shown that the ratio is also asymptotically normal 
with uhit mean under RWI, the variance of the limiting distribution is not 
apparént since the two variance estimators are clearly not asymptotically 
uncorrelated. 


Alou we seek the limiting distribution of the ratio of the variances 


But since the estimator бї is asymptotically efficient under the null 
hypothesis RWI, we may use Hausman’s (1978) insight that the asymptotic 


We assume normality only for expositional convenience—the results in this section apply 


much more generally to log price processes with HD increments that possess finite fourth 
moments, 
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variance of the difference of a consistent estimator and an asymptotically 
efficient estimator is simply the difference of the f asymptotic variances. * 
lt we define the variance difference estimator as VD(2). = 6% — 6% then 
(2.4.23), (2.4.24), and Hausman's result implies: 


VND VDO) ~ NWO, 20h. (2.4.25) 


The null hypothesis А can then be tested using (2.4.25) and any consistent 
estimator 247 of 20 (for example, 266705): Construct the standardized 
statistic VD(2)/ у which has a limiting standard normal distribution un- 
der RWI, and reject the null hypothesis at the 5% level if it lies outside the 
interval [—1.96, 1.96]. 

The asymptotic distribution of the two-period variance ratio statistic 
VRQ) = 67/0? now follows directly from (2.4.25) using a first-order Taylor 
approximation or the delta method (see Section A.4 of the Appendix): 


VR = . SIRO- 1) < ЛО, 2). 2.4.26) 
z 


The null hypothesis Ho can be tested by computing the standardized statis- 
tic V9a(VR(2) - V which is asymptotically standard normal—il it lies 
outside the interval - 1.96, 1.96], RWI may be rejected at the 5% level of 
signilicince. 

Although the variance ratio is often preferred to the variance difference 
because the ratio is scale-free, observe that if 2(82)? is used to estimate 2o |, 
then the standard significance test of VD=0 for the difference will yield the 
same inferences as the corresponding test of VR 1=0 for the ratio since: 


УК) An, - Уво) 
25 vn, ~ a) = —————————— ~ М0, J). (2.4.27) 
E 2 
20; 2 
Therefore, in this simple context the two test statistics are equivalent. Ilow- 
ever, there are other reasons that make the variance ratio more appealing 


9 
VAG, 


“Brie ty, Hausman (1978) exploits the fact that any asymptotically efficient estimator ol 
a paneter (V say He шим possess the property that it is asymptotically uncerrelaed with 
the dilterenee 0, — th, where 0% i is any other estimator of U. H not, then there exists a linear 
combinadon of 0, and 8, — 0, that is more eHicient ian 0, contradic ting the assumed efficiency 
ob d, Lhe result follows directly, then, since: 


амаг = aVar[@ +0, =O} ама t ava = . | 


> аМаг[0, - 6 ES aVer [Ay | - aMu 10, | 


he te aVar]-] denotes the asymptotic variance operator, 
MIT particul, n apply the delta method to 10. ПАРТ үй» where (0, =á; 42 . 02567, and 


observe tha LS €2 and 82 are asymptotically uncorrelated because а ban efficient estimator, 
^ 


and these are discussed in Cochrane (1988), Faust (1992), and Lo and 
Mackinlay (1988, 1080). 

The variance difference and ratio statistics сап be easily generalized 
to multiperiod returns, Let our sample consist of nq observations %. 


* . Pag}, where ꝗ is any integer greater than one and define the estima- 
tors 
1 ! Y yee ) — (2438) 
I = س‎ uc Day) = — (%%% — ‚4.2 
; ng tA hee nq nq = h 
„д | 
52 — = fp 24,20 
a p » h pii — Й) ( ) 
l n 
d = — = ра. — qA 2.4.30 
aziq) m УЗ Gk — Pyk-y = qn) ( ) 
Ui 2r a2 vn 9200) , 
VD) = 65 / ~aj, VR(q) ————, (2.4.31) 


te 


ó 
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Using similar arguments, the asymptotic distributions of VD) and VRG) 
under the RWI null hypothesis are 


SIND) ~ N(0.2(- а") (2.4.32) 
RZ — 1) А N (0, 2(4—1)). (2.4.33) 


Two important refinements of these statistics can improve their finite- 
sample properties substantially. The first is to use overlapping q-period re- 
turns in estimating the variances by defining the following alternative esti- 
mator for a”; 

my 


22 1 A m me 
67) = ng 2 (Pe фа 7 qf). (2.4.34) 
key 


This estimator contains nq qt Î terms, whereas the estimator? (g) contains 
only n terms. Using overlapping period returns yields a more efficient 
estimator and hence a more powerful test. 
The second refinement involves correcting the bias in the variance es- 
timators 62 and A before dividing one by the other. Denote the unbiased 
. لاس‎ — 
estimators as 87 and о (y), where 
ug 
= | * 2.4.35 
CoS = (ра = pii = В) (2.4.35) 
ng — | nam 


nq 


= = — у `{ 1 — ТОУ — TH 
7; " 2. Pe Pag = di 
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4 
= — Dit-—J], 2.4.36 
m 709 4-0 ( 2) ( ) 
and define the statistics: 
x Р E NE 6 (% i 
VD) = ge , Ro = 240. (2.4.37) 


This yields an unbiased variance difference estimator, however, the variance 
ratio estimator is still biased (due to Jensen's Inequality). Nevertheless, 
simulation experiments reported in Lo and MacKinlay (1989) show that 
the finite-sample properties of VR(q) arc closer to their asymptotic limits 
than VR(4). 

Under the null hypothesis 77, the asymptotic distributions of the vari- 
ance diíference and variance ratio arc given by | 


Уб) < N (o ED ge) (2.4.38) 
3q 
z a 22q-1)(q—1 
Jag МЕ) —1) 5 n (o 2). (2.4.39) 


These statistics can then be standardized in the usual way to yield asymptot- 
ically standard normal test statistics. As before, if o“ is estimated by 5+ in 
standardizing the variance difference statistic, the result is the same as the 
standardized variance ratio statistic: i 
i 


Бе 2(29-1)(q-1) V7? | 
Y) vs Rp - » ( ED) 4.40) 


Ш 


3q 
JnqVD(q) (fran 


2 
4 »q 
y 9, 


Sampling Distribution of VR(q) under RW3 
Since there is a growing consensus among financial economists that volatil- 
ities change over time (sec Section 12.2 in Chapter 12), a rejection of the 
random walk hypothesis because of heteroskedasticity would not be of much 
interest. Therefore, we seek a test for RW3. As long as returns are uncorre- 
lated, even in the presence of heteroskedasticity the variance ratio must still 
approach unity as the number of observations increases without bound, for 
the variance of the sum of uncorrelated increments must still equal the sum 
of the variances. Howeyer, the asymptotic variance of the variance ratios will 
clearly depend on the type and degree of heteroskedasticity present. 

One approach is to model the heteroskedasticity explicitly as in Section 
12.2 of Chapter 12, and then calculate the asymptotic variance of VR(q) un- 
der this specific null hypothesis. However, to allow for more general forms 


-1/2 
) V.. 1). 
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of heteroskedasticity, we follow the approach taken by Lo and MacKinlay 
(1988) which relies on the heteroskedasticity-consistent methods of White 
(1980) and White and Domowitz (1984). This approach applies to a much 
broader class of log price processes [5j] than the HD normal increments 
process of the previous section, a particularly relevant concern for US stock 
returns as Table 1.1 illustrates, 14 Specifically, let n = p+ €n and define 
the following compound null hypothesis Hg: 


(HI) For all t, Ele] = O, and Elere] = 0 for any t Æ 0. 


(H2) {e} is $-mixing with coefficients $ (m) of size r/(2r—1) or is a mixing with 
coefficients a (m) of size r[(r— 1), where r > 1, such that for all t and for any 
t > O, there exists Some & > 0 for which Elle, e, D] < A < со. 


1 2 А 5 
H3) lim — E(e?] = о? < оо. 
( ] 17 nq 3 А 


(H4) For all t, Efe, уе в] = 0 for any nonzero j and k where X k. 


Condition (111) is the uncorrclated increments property of the random 
walk that we wish to test. Conditions (1 12) and (H3) are restrictions on the 
maximum degree of dependence and heterogeneity allowable while still 
perniitting some form of the Law of Large Numbers and the Central Limit 
Theorem to obtain (see White [ 1984] for thc definitions of G- and a-mixing 
random Sequences). Condition (H4) implies that the sample autocorrela- 
tions of e, are asymptotically uncorrelated; this condition may be weakened 
Considerably at the expense of computational simplicity (see note 15). 

905 compound null hypothesis assumes that p, possesses uncorrelated 
increments but allows for quite general forms of heteroskedasticity, includ- 
ing deterministic Changes in the variance (due, for example, to seasonal 
factors) and Engle's (1982) ARCH processes (in which the conditional vari- 
ance depends оп past information), 

Since VR(q) still approaches one under Hj, we need only compute its 
asymptotic variance [call it 0(q)] to perform the standard inferences. Lo 
and MacKinlay (1988) do this in two steps. First, recall that the following 
equality holds asymptotically under quite general conditions: 


4-1 

vs a * N 

VR = 142 ) (: — =) ph). (2.4.41) 
mn q Р 


Hor Course, second moments are sull assumed to be finite; otherwise, the variance rano 
is no longer well defined. This rules out disuibutious with intinne variance, such as thase in 
the stable Pareto-Levy family (with characteristic exponents that are less than 2) proposed by 
Mandelbrot (1963) and Fama (1965), However, many other forms of leptokurtosis are allowed, 
such as that generated by Engle's (1982) autoregressive conditionally heteroskedastic (ARCH) 
process (see Section 12.2 in Chapter 12). 
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Second, ne that under Hj (condition (114)) the autocorrelation coefli- 
cient estim “(k) are asymptotically uncorrelated.” 1f the asymptotic 
variance À, « h of the Q(K)'s can he obtained under Но, the asymptotic 
variance 0(4) of VRY) may be calculated as the weighted sum of the бу, 
where the weights are simply the weights in relation (2.4.41) squared. De- 
note by б, and 0(q) the asymptotic variances of p(k) and VR(q), respectively. 
Then under the null hypothesis Hj Lo and MacKinlay (1988) show that 


I. The statistics VD(q), and VR(q)-! converge almost surely to zero for all 
qas n increases without bound. 
2. The following is a heteroskedasticity-consistent estimator of бу: 


vong az co YES. EDS — ye 
i, ES nq ы (ру bj-1 A) (Pj= um n) | (2.4.49) 


. [ES ba - àv] 


3. The following is a heteroskedasticity-consistent estimator of 0(4): 


1-1 kN 

6(q) = 4 (: = 3 $. (2.4.43) 
kzl 

Despite the presence of general heteroskedasticity, the standardized test 

statistic ¥ "(4) 


VR -1 a 
y*m = PELA ~ МО, 1) 


Js 


can be used to test Hs in the usual way. 


(2.4.44) 


2.5 Long-Horizon Returns 


Several recent studies have focused on the properties of long-horizon re- 
turns to test the random walk hypotheses, in some cases using 5- to 10- 
year returns over a 65-year sample. There are fewer nonoverlapping long- 
horizon returns for a given time span, so sampling errors are gencrally 


P Although this restriction on the fourth cross-moments ol e, шау seem somewhat unintu- 
ine, it is satisfied tor any process with independent increments (regardless of heterogeneity) 
and abo lor linear Gaussian ARCH processes. This assumption may be relaxed entirely, requir- 
ng the estimation of the asymptotic covariances of the autocorrelation estimators in order to 
estimate the Bimiting variance 9 of VR(Q) ма (2.4.41). Although the resulting estimator of 8 
would be more complicated than equation (2.4.43), itis conceptually su aighitorward and may 
readily be formed along the lines of Newey and West (1987). An even more general (and pos- 
Sibly more exact) sampling theory for the variance ratios may be obtained using the results of 
Dutour (1981) and Dufour aud Roy (1985). Again, this would sacrifice much of the simplicity 
of our asymptotic results, 


larger for statistics based on long-horizon returns. But for some alternatives 
to the random walk, long-horizon returns can be more informative than 
their shorter-horizon counterparts (sec Section 7.2.1 in Chapter 7 and Lo 
and MacKinlay [1989]). 

One motivation for using long-horizon returns is the permanent/tran- 
sitory components alternative hypothesis, first proposed by Muth (1960) in 
a macroeconomic context. In this model, log prices are composed of two 
components: a random walk and a stationary process, 


ho = mty (2.5.1) 
ш = JU wr + Ey, €, ~ IID(0, a?) 
j = any zero-mean stationary process, 


and {ш} and [y] are mutually independent. The common interpretation 
for (2.5.1) as à model of stock prices is that zo is the “fundamental” com- 
ponent that reflects the efficient markets price, and y, is a zero-mean sta- 
tionary component that reflects a short-term or transitory deviation from 
the efficientanarkets price зо, implying the presence of "fads" or other mar- 
ket inefficiencies. Since yy is stationary, it is mcan-reverting by definition 
and reverts to its mean of zero in the long run. Although there are several 
difficulties with such an interpretation of (2.5. ) —not the least of which 
is the fact that market efficiency is tautological without additional economic 
structtre—nevertheless, such an alternative provides a good laboratory for 
studying the variance ratio's performance. 

While VR(q) can behave in many ways under (2.5.1) for small q (de- 
pending on the correlation structure of y), as q gets larger the behavior of 
VR(q) becomes less arbitrary. In particular, observe that 


„ рер = e FN ул (2.5.2) 
1-\ 9g-! 

n(q) = 3 n- = quc Ek у mg (2.5.3) 
150 k=0 

Var = qa? + 2y,0) = 2y,(q). (2.5.4) 


where уу) Coy y, 57 1 is the autocovariance function of у. Therefore, 
in this case the population value of the variance ratio becomes 


аг], 2 4.2760) — 2 
VR = \ win] 4а + 2y,(0) AUD (2.5.5) 
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a? + 2y,(0) ~ 2y,(1) 
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= D Var[Ay] + Маг[А ш] 
Var[Ay] 
VR(q) = Var Ap] (2.5.7) 


where (2.5.6) requires the additional assumption that у,(4) 0 as доо, 
an asymptotic independence condition that is a plausible assumption for 
most economic time series.” This shows that for a sufficiently long hori- 
zon q, the permanent/transitory components model must yield a variance 
ratio less than one. Moreover, the magnitude of the difference between the 
long-horizon variance ratio and one is the ratio of the variance of ^y, to 
the variance of Ар, a kind of "signal/ (signal+noise)” ratio, where the “sig- 
nal” is the transitory component and the “noise” is the permanent markets 


component. In fact, one might consider extracting the “signal/noise” ratio 
from VR(q) in the obvious way: 


| Var[Ay] 
SLUT ی‎ E LN 
уко) Var{ Аш] 


2.5.1 Problems with Long-Horizon Inferences 


There are, however, several difficulties with long-horizon returns that stem 
from the fact that when the horizon q is large relative to the total time span 
T—nq, the asyinptotic approximations that are typically used to perform 
inferences break down, | 

For example, consider the test statistic VR( 0) —1 which is asymptotically 
normal with mean 0 and variance: 


290-64-10) — 4 [44-5051 
3nq? EET q* 


(2.5.8) 


under the RWI null hypothesis. Observe that for all g>2, the bracketed term 
in (2.5.8) ts bounded between Я and 1 and is monotonically increasing in 


q. Therefore, for fixed n, this implies upper and lower bounds for V are 


qy und i respectively, Now since variances cannot be negative, the lower 


This is implied by ergodicity, and even the long-range-dependent time series discussed'in 
Section 2.6 satisfy this condition, 


58 2. The Predictability of Asset Returns 


bound for VR(q)—1 is - I. But then the smallest algebraic value that the test 


statistic ( )/ can take on is: 


(/ ~ 1 -I - 
Min ————— = Vn = - (ӘТ. (2.5.9) 
VV Min JV 1 


бирре that q is set at two-thirds of the sample size T so that 7/425. This 
implies that the normalized test statistic VR(D/VV can never be Jess than 
— 1.73; hence the test will never reject the null hypothesis at the 95% level of 
significance, regardless of the data! Of course, the test statistic can still reject 
the null hypothesis by drawing from the right tail, but against alternative 
hypdtheses that imply variance ratios less than one for large q—such as tlie 
permanent/ transitory components model (2.5.1)—the variance ratio test 
will lave very little power when / T is not close to zero. 

A more explicit illustration of the problems that arise when q/ F is large 
may pe obtained by performing an alternate asymptotic analysis, one in 
which q grows with Tso that 907) / T approaches some limit ê strictly between 
zero and one. In this case, under RWI Richardson and Stock (1989) show 
that the unnormalized variance ratio VR(q) converges in distribution to the 
following: 

i 


| 1 
! VR) > al Хт) dr (2.5.10) 
' ô 

Хит) = Hr) = B(r—8) — 810, (2.5.11) 


where B(-) is standard Brownian motion defined on the unit interval (see 
Section 9.1 in Chapter 9). Unlike the standard "fixed-4" asymptotics, in this 
case Rig) does not converge in probability to one. Instead, it converges in 
distribution to a random varíable that is a functional of Drownian motion. 
The expected valuc of this limiting distribution in (2.5.10) is 


E V н 
E ; / X(T) dt | = 3 EI X (r)] dt = 0-37. (2.5.19) 
ô Js 5 Js 
In our earlier example where 4/1 T i „the . asymptotic approxi- 


ination (2.5.10) implies that E. RO] converges to ;z 5 considerably less than 
one despite the fact that RWI holds. 

"These biases are not unexpected in light of the daunting demands we 
are placing on long-horizon returns without more specific economic struc- 
ture, it is extremely difficult to infer much about phenomena that spans a 
significant portion of the entire dataset. This problem is closely related to 
one in spectral analysis: estimating the spectral density function near fre- 
quency zero. Frequencies near zero correspond to extremely long periods, 
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and itis notoriously difficult to draw inferences about periodicities that ex- 
coed th soan of the data.!? We shall sec explicit evidence of such difficulties 
in the empirical results of Section 2.8. However, in some cases long-horizon 
returns can yield important insights, especially when other economic vari- 
ables such as the dividend-price ratio come into pkiy—see Section 7.2.1 in 
Chapter 7 for further discussion, 


2.6 Tests For Long-Range Dependence 


There is one departure from the random walk hypothesis that is outside the 
statistical framework we have developed so far, and that is the phenomenon 
of long-range dependence. Long-range-dependent time series exhibit an un- 
usually high degree of persistence—in a sense to be made precise below—so 
that observations in the remote past are nontrivially correlated with obser- 
vations in the distant future, even as the time span between the two ob- 
servations increases. Nature's predilection towards long-range dependence 
hes been well-documented in the natural sciences such as hydrology, mete- 
orology, and geophysics, and some have argued that economic time series 
are also long-range dependent. In the frequency domain, such time se- 
ries exhibit power at the lowest frequencies, and this was thought to be so 
commonplace a phenomenon that Granger (1966) dubbed it the “typical 
spectral shape of an economie variable." Mandelbrotand Wallis (1968) used 
the more colorful term “Joseph Effect," a reference to the passage in the 
Book of Genesis (Chapter 41) in which Joseph foretold the seven years of 
plenty followed by the seven years of famine that Egypt was to experience.!“ 


2.6.1 Examples of Long-Range Dependence 


A typical example of long-range dependence is given by the fractionally dif- 
ferenced time series models of Granger (1980), Granger and Joyeux (1980), 
апа Hosking (1981), in which р, satisfies the following difference equation: 


(Dp = e, е ~ IDO, og), (2.6.1) 


where J. is the lag operator, ie., Lp; = pi-i. Granger and Joyeux (1980) 
and Hosking (1981) show that when the quantity (L—L)4 is extended to 
noninteger powers of d in the mathematically natural way, the result is a 


U See the discussion and analysis in Section 2.6 for further details. 

This biblical analogy is not completely frivolous, since long-range dependence has been 
documented in various hydrological studies, not the least of which was Hurst's (1951) seminal 
study on measuring the long-term storage capacity of reservoirs. Indeed, much of Hurst's 
research was motivated by his empirical observations of the Nile, the very same river that 
played so prominent a role ia Joseph's prophecies, 


well«iefined time series that is said to be fractionally differenced of order d 
(or, equivalently, fractionally integrated of order -d). Briefly, this involves 
expanding the expression (-.) “ via the binomial theorem for noninteger 
powers: 


d~- = 2 i js 


k=) 


l = —9)...(d — 
(0 2 d(d — (d — 2) ---(а- k+!) (2.6.9) 
k k! 


and then applying the expansion to fy: 


(1— ) = 60 A hu = є, 2.6.3) 


к=) к=0 


where the autoregressive coefficients Ду are often re-expressed in terms of 
the gamma function: 


1 (k= d) : 
a= c(t) = FT: 2.6. 
pies n Pd) PU 0) PAM 


fi may also be viewed as an infinite-order MA process since 


2 ГЕУ d) 
-(l-Ly"e = В) є. Ву = ,س‎ 2.6.5 
pi ( "e, (L) €, * Pd) F(R +1) (2.6.5) 


It is not obvious that such a definition of fractional differencing might yield 
a useful stochastic process, but Granger (1980), Granger and Joyeux (1980), 
and Hosking (1981) show that the characteristics of fractionally differenced 
time series are interesting ийсе For example, they show that / is station- 
ary and invertible for de(— H, D (sce Hosking [1981]) and exhibits a unique 
kind of dependence that is positive or negative depending on whether d is 
positive or negative, Len the autocort clation coefficients of p, are of the same 
sign as d. So slowly do the autocorrelations decay that when d is Bosne 
their sum diverges to infinity, and collapses to zero when d is negative." 

To develop a sense for long-range dependence, compare the autocor- 
relations of a fractionally differenced , with those of a stationary AR(1) in 
Table 2.3. Although both the AR(1) and the fractionally differenced (d= 1) 


"Mandelbrot and others have called the d«0 case antipersistence, reserving the term long- 
range dependence for the d »0 case. However, since both cases involve autocorrefations that 
decay much more slowly than those of more conventional time series, we call both long-range 
dependent. 
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Table2.3. Autocorrelation function for fractionally differenced process. 


lag Pplk) py (Q0) 0% 
k [dei] [de - 1) IAR). = .5] 
1 0.500 —0.250 0.500 
2 0.400 —0.071 0.250 
3 0.350 —0.036 0.125 
4 0.318 —0.022 0.063 
5 0.295 —0.015 0.031 
10 0.235 —0.005 0.001 
25 0.173 —0.001 2.98 x 107^ 
50 0.137 —3.94 x 1074 8.88 x 1076 
100 ^ 0.109 —].02 x 1074 7.89 x 1073 


C patio of autocorrelation functions of fractionally differenced time series (1 ~ L)4p, = 


ford = HN J. with that of an AR(1) p; = ф-т +, ф = .5. The variance of e, sacle 
to yield a unit variance for f, in all three cases. 


" 


series have firstorder autocorrelations of 0.500, at lag 25 the AR(1) icor- 
relation is 0.000 whereas the fractionally differenced series has correlation 
0.173, declining only to 0.109 at lag 100. In fact, the defining characteristic 
of long-range dependent processes has been taken by many to be this slow 
decay of the autocovariance function. 

More generally, long-range dependent processes {p} may be defined to 

be those processes with awtocovariance functions yp(k) such that | 

i 
йге K. fk) (ог v € (—1,0) or, did oss 9:66) 

s =k fi(k) for v e (2, –1) 


where fi(k) 4s any slowly varying function at infinity? Alternatively, ibng- 
range dependence has also been defined as processes with spectral density 
functions s(A) such that 


sÀ) ~ A7" f(k) as A > 0, a € (–1, 1), (2.6.7) 


" 


where /2(k) is a slowly varying function. For example, the autocovariance 


204 function f(x) is said to be slowly varying at oo if lim, eo f(fx)/f(x) = 1 for alle € 


a. оо). The function log x is an example of a slowly varying function at infinity. 
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function and spectral density near frequency zero of the fractionally “Шет- 
enced process (2.6.1) is 


o? V (1-24) P(k+d) 


\ 

i II хеше 

| ne FD Fd) Pk + 1 - d) 

1 D 

{ cis! as А + оо (2.6.8) 
| SA) E (1 еду е) o? 

| ~ 8 as ۸ ج‎ 0, (2.6.9) 


where de (2. 1). Depending on whether d is negative or positive, the 


spectral density of (2.6.1) at frequency zero will either be zero or infinite. 
t 


{ 2.6.2 The Hurst-Mandelbrot Rescaled Range Statistic 


The importance of long-range dependence in asset markets was first stud- 
ied by Mandelbrot (1971), who proposed using the range over standard 
deviation, or R/S, statistic, also called the rescaled range, 10 detect long-range 
dependence in economic time series. The R/S statistic was originally devel- 
oped by the English hydrologist Harold Edwin Hurst (1951) in his studies 
of river discharges. The R/S statistic is the range of partial sums of de- 
viations of a time series from its mean, rescaled by its standard deviation. 
Specifically, consider a sample of continuously compounded asset returns 
n. . . Ta} and lei 7, denote the sample mean + E, ту. Then the classical 


rescaled-range statistic, which we shall call Qui is given by 


. = - Max Уи — 7.) — Min Le — Fn) (2.6.10) 


where s, is the usual mu AE likclihood) standard deviation estimator, 
| 1/2 
„ 1 Уут). (2.6.11) 
n 


J 

The first term in brackets in (2.0.10) is the maximum (over k) of the partial 
sums of the first k deviations of r; from the sample mean. Since the sum 
of all n deviations of 7,’s from their mean is zero, this maximum is always 
nonnegative. The second term in (2.6.10) is the minimum (over k) of this 
same sequence of partial sums, and hence it is always nonpositive. The 
difference of the two quantities, called the range for obvious reasons, is 
always nonnegative and hence Q, 20.7! 


2 he behavior of Q, may be beiter understood by considering its origins in hydrological 


studies of reservoir design. To accominodate seasonalities in riverflow, a reservon s» capacity 
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In several seminal papers Mandelbrot, Лади, and Wallis demonstrate 
the superiority of R/S analysis to more conventional methods of determin- 
ing long-range dependence, such as analyzing autocorrelations, variance 
ratos, and spectral decompositions. For example, Mandelbrot and Wal- 
lis (1909b) show by Monte Carlo simulation that the R/S statistic can de- 
tect long-range dependence in highly non-Gaussian time series with large 
skewness and/or kurtosis. In fact, Mandelbrot (1972, 1975) reports the 
almost-sure convergence of the R/S statistic for stochastic processes with 
infinite variances, a distinct advantage over autocorrelations and variance 
ratios which need not be well-defined for infinite variance processes, Fur- 
ther aspects of the R/S statistic's robustness are developed in Mandelbrot 
and Taqqu (1979). Mandelbrot (1972) also argues that, unlike spectral anal- 
ysis which detects periodic cycles, R/S analysis can detect nonperiodic cycles, 
Cycles with periods equal to or greater than the sample period. 

Although these claims may all be contested to some degree, it is a well- 
established fact that long-range dependence can indeed be detected by the 
"classical" R/S statistic. However, perhaps the most important shortcoming 
or the rescaled range is its sensitivity to short-range dependence, implying 
that any incompatibility between the data and the predicted behavior of 
the R/S statistic under the null hypothesis need not come from long-range 
dependence, but may merely be a symptom of short-term memory, 

In particular Lo (1991) shows that under RWI the asymptotic distri- 
bution of Q/ n) Q, is given by the random variable V, the range of a 
Brownian bridge, but under a stationary AR(1) specification with autore- 
gressive coefficient $ the normalized R/S statistic converges to £V where 
беу (ГЕФ) Тф), For weekly returns of some portfolios of common stock, 
È is as large as 5096, implying that the mean of Q,/ Jn may be biased up- 


must be chosen to allow for fluctuations in the supply of water above the dam while still 
Maintaining a relatively constant flow of water below the dam. Since dam construction costs 
are immense, the importance of estimating the reservoir Capacity necessary to meet long-term 
storage needs is apparent. The range is an estimate of this quantity. If X, is the riverflow 


(per unit time) above the dam and X. is the desired riverflow below the dam, the bracketed 
quantity in (2.6.10) is the capacity of the reservoir needed to ensure this smooth flow given the 
pattern of Hows in periods 1 through n. For example, suppose annual rivertlows are assumed 
to be 100, 50, 100, and 50 in years | through 4. If a constant annual flow of 75 below the dam 
is desired each year, a reservoir must have a minimum total capacity of 25 since it must store 25 
units in years 1 and 3 to provide for the relatively dry years 2 and 4. Now suppose instead that 
the natural pattern of riverflow is 100, 100, 50, 50 in years 1 through 4. To ensure a flow of 75 
below the dam in this case, the minimum Capacity must increase to 50 so as to accommodate 
the excess storage needed in years ! and 2 to supply water during the *dry spell” in years 3 
and 4. Seen in this context, it is clear that an increase in persistence will increase the required 
storage capacity as measured by the range. Indeed, it was the apparent persistence of “dry 
spells” in Egypt that sparked Hurst's lifelong fascination with the Nile, leading eventually to 


lis interest in the rescaled range, ® 
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ward by 736! Since the mean of V is V722:1.25, the mean of the classical 
rescaled range would be 2.16 for such an AR(L) process. 

Lo (1991) develops a modification of the R/S statistic to account for 
the effects of short-range dependence, derives an asymptotic sampling the- 
ory under several null and alternative hypotheses, and demonstrates via 
Monte Carlo simulations and empirical examples drawn from recent histor- 
ical stock market data that the modified R/S statistic is considerably more 
accurate, often yielding inferences that contradict those of its classical coun- 
terpart, In particular what the cartier literature had assumed was evidence 
of long-range depeudence in US stock returns may well be the result of 
quickly decaying short-range dependence instead. 


2.7 Unit Root Tests 


A more recent and more specialized class of tests that are often confused 
with tests of the random walk hypotheses is the collection of unit root tests 
in which the null hypothesis is 


Nom % N abes (9.7.1) 
often with the following alternative hypothesis: 
N % = O(N, | = "и—1)) +e. ф € (1. J). (2.7.2) 
where e, is any zero-mean stationary process, such that 


T 
0 „ = lim E Я 2.7.3 
< о; їп. = e « oo (2.7.3) 


Ж t=] 


Heuristically, condition (2.7.3) requires that variance of the partial sum 
ean €, increase at approximately the same rate as 7, so that cach new e, 
added to the partial sum has a nontrivial contribution to the partial sum's 
variance?" "This condition ensures that the usual limit theorems are appli- 
cable to the e's, and it is satisfied by virtually all of the stationary processes 
that we shall have occasion to study (except for those in Section 2.6). 


Tae pnr tial uns variance were to grow slower than T, so that the limit in (2.790 were 
O. the uncertainty in the sequence ob cs would be “cancelling out” over time and would nor 
be a verv useful model of candor price dynamics, An example of such à process is an МАС 
WHEE root, 14 €, р cap as Where y, is white noise. 

M the partial sums variance were to grow faster than T so that the limit in (2.7.3) were . 
this would be an example of fengonange dependence, w which the autocorvelation function of 
the ers decays very slosh, Ап example of such a process is a fractionally diflerenced process 
(OY, с ay where gy, S white noises See Section 2. and Lo (91) for farther discussion 
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The unit root test is designed to reveal whether X, is diſſerence stationary 
(the null hypothesis) or trend-stationary (the alternative hypothesis); this 
distinction rests on whether $ is unity, hence the term unit root hypothesis. 
The test itself is formed by comparing the ordinary least squares estima- 
tor ф to unity via its (nonstandard) sampling distribution under the null 
hypothesis (2.7.1), which was first derived by Dickey and Fuller (1979). 
Under the null hypothcsis, any shock to X, is said to be permanent since 
EI XIII | X] = uk X, for all k>0, and a shock to X, will appear in the 
conditional expectation of all future X,,,. In this case X, is often called a 
stochastic trend since its conditional expectation depends explicitly on the 
stochastic variable X,. In contrast, under the alternative (2.7.2), a shock to 
X, is said to be temporary, since E[ Xx | X] = uU+k) + ФАХ, шо), and 
the influence of X, on the conditional expectation of future X diminishes 
as k increases. 

Because the e,'s are allowed to be an arbitrary zero-mean stationary 
process under both the unit root null (2.7.1) and alternative hypothesis 
(2.7.2), the focus of the unit root test is not on the predictability of X., as 
it is under the random walk hypotheses. Even under the null hypothesis 
(2.7.1), the increments of X, may be predictable. Despite the fact that the 
random walk hypotheses are contained in the unit root null hypothesis, 
it is the permanent/temporary nature of shocks to X, that concerns such 
tests. Indeed, since there are also nonrandom walk alternatives in the unit 
root null hypothesis, tests of unit roots are clearly not designed to detect 
predictability, but are in fact insensitive to it by construction. 


| 

| 

| 
2.8 Recent Empirical Evidence 


` 1 


Predictability in asset returns is a very broad and active research topic, and it 
is iinpossible to provide a complete survey of this vast literature in just a few 
pages. Therefore, in thissection we focus exclusively on the recent empirical 
literature.“ We hope to give readers a sense for the empirical relevance af 
predictability in recent equity markets by applying the tests developed in the 
earlier sections to stock indexes and individual stock returns using daily and 
weekly data from 1962 to 1994 and monthly data from 1926 to 1994. Despite 
| 


Since then, advances in econometric methods have yielded many extensions and general- 
izations to this simple framework: tests for multiple unit roots in multivariate ARIMA systems, 
tests for cointegration, consistent estimation of models with unit roots cointegration, etc. (see 
Campbell and Perron [1991] for a thorough survey of this literature). | 

"However, we would be remiss if we did not cite the rich empirical tradition on which the 
recent literature is built, which includes: Alexander (1961, 1964), Cootner (1964), cona 
(1960), Cowles and Jones (1937), Fama (1965), Fama and Blume (1966) Kendall (1953)? 
Granger and Morgenstern (1963), Mandelbrot (1963), Osborne (1959, 1962), Roberts (1959), 
and Working (1960). 


| 
i 
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the specificity of these examples, the empirical results illustrate many of the 


issues that have arisen in the broader search for predictability among asset 
returns. 


| 


Table 2.4 reports the means, standard deviations, autocorrelations, and Box- 
Pierce Q-statistics for daily, weekly, aud monthly CRSP stock returns indexes 
from July 3, 1962 to December 31, 1994.5 During this period, panel A of 
Table 2.4 reports that the daily equal-weighted CRSP index has a first-order 
autocorrelation 9(1) of 35.0%. Recall from Section 2.4.1 that under the HD 
random walk null hypothesis RWI, the asymptotic sampling distribution of 
P(1) is normal with mean 0 and standard deviation Weg (sce (2.4.14)). 
This implies that a sample size of 8,179 observations yields a standard error 
of 1.11% for 0(1); hence an autocorrelation of 35.0% is clearly statistically 
significant at all conventional levels of significance. Moreover, tlie Box- 
Pierce Q-statistic with five autocorrelations has a value of 263.3 which is 
significant at all the conventional significance levels (recall that this statistic 
is distributed asymptotically as a x7 variate for which the 99.5-percentile 
is 16.7). 

Similar calculations for the value-weighted indexes in panel A show 
that both CRSP daily indexes exhibit statistically significant positive serial 
correlation at the first lag, although the equal-weighted index has higher 
autocorrelation which decays more slowly than the value-weighted index. 
The subsample autocorrelations demonstrate that the significance of the 
autocorrelations is not an artifact of any particularly influential subset of the 
data; both indexes are strongly positively autocorrelated in each subsample, 

To develop a sense of the economic significance of the autocorrelations 
in Table 2.4, observe that the R? of a regression of returns on a constant 
and its first lag is the square of the slope coefficient, which is simply the 
first-order autocorrelation. Therefore, an autocorrelation of 35.0% implies 
that 12.3% of the variation in the daily CRSP equal-weighted index return 
is predictable using the preceding day's index return. 


2.8.1 Autocorrelations 


"Unless stated otherwise, we take returns to be continuously compounded, Porfolio 
returns рге calculated first from simple returns and then are converted to a continuously 
compounded return. The weekly return of cach security is computed as the return from 
Tuesdayjs closing price to the following Tuesday's closing price. If the following Tuesday's 
price is missing, then Wednesday's price (or Monday's if Wednesday's is also missing) is used. 
If both Monday's and Wednesday's prices are missing, the return for that week is reported 
as missing; this occurs only rarely. To compute weekly returns on size-sorted portfolios, for 
each week all stocks with nonmissing returns that week are assigned to portfolios based on the 
beginning of year market value. If the beginning of year market value is missing, then the end 
of year value is used. If both market values are missing the stock is not assigned to a portfolio. 


! 


„ .J. Autocorrelation in daily, weekly, and monthly stock index returns. 


Sample Sample : . 2 8 > ` 
Period Size Mem SD ж лю Ps Pa E. Qh 


A. Daily Returns 
CRSP Value-Weighted Index 
62:07:03-94:12:30 8,179 0.041 0.824 17.6 —0.7 01 —0.8 263.3 269.5 
62:07:03-78:10:27 4,090 0.028 0.748 27.8 1.2 4.6 3.3 329.4 343.5 
78.0.3094: 12:30 4,089 0.054 0.901 10.8 —2.2 —29 -3.5 69.5 72.1 
CRSP Equal-Weighted Index 


62:07:03-94:12:30 8,179 0.070 0.764 35.0 9.3 8.5 929 1,301.9 1,369.5 
62:07:03—78:10:27 4,090 0.063 0.771 431 13.0 15.3 15.2 1,062.2 1,110.2 
78:10:30-94:12:30 4,089 0.078 0.756 26.2 4.9 2.0 4.9 3489 379.5 


B. Weekly Returns 


CRSP Value-Weighted Index 
62:07:10-94:12:27 1,695 0.196 2.093 1.5 -2.5 3.5 -0.7 8.8 36.7 
62:07:10-78:10:03 848 0.144 1.994 5.6 -3.7 5з 1.6 9.0 21.5 
78:10:10-04:12:27. 847 0.248 2.188 -2.0 ~15 1.6 —3.3 5.3 25.2 
CRSP Equal-Weiphted Index 
62:07:10-94:12:27 1,695 0.339 2.321 20.3 6.) 9.1 4.8 94.3 109.5 
02:07:10-78:10:03 848 0.324 2.460 21.8 7.5 11.9 61 00.4 08.5 
78:10:10-94:12:27. 847 0.354 2.174 184 43 55 22 33.7 51.3 


C. Monthly Returns 
CRSP Value-Weighted Index 
62:07:31-94:12:30 390 0.861 4.336 4.3 —5.3 —1.3 —0.4 6.8 12.5 
62:07:31-78:09:29 195 0.646 4.219 6.4 3.8. 7.3 6.2 3.9 9.7 
78:10:31 04:12:30 195 1.076 4.450 1.3 —6.3 —8.3 —7.7 7.5 14.0 
CRSP Equal-Weighted Index 
62:07:31-04:12:30. 390 1.077 5.749 171 —3.4 -3 
62:07:31-78:09:29 195 1.049 6.148 18.4 —2.5 4 
78:10:31-94:12:30 195 1.105 5.336 15.0 —1.6—12 


3 21.6 12.8 21.3 
4 2.4 7.5 12.6 
4 —7.4 8.9 14.2 


Aumoconelation coefficients (in percent) and Box-Pierce (statistics loi CRSP daily, weekly, 
and monthly value- aud equal-weighted return indexes for the sample period from July 3, 1962 
to Deceinber 30, 1994 and subperiods. 


2. Lhe Predictability of Asset Returns 


The weekly and monthly return autocorrelations reported in pancls B 
and C of Table 2.4, respectively, exhibit patterns similar to those of the daily 
autocorrelations: positive and statistically significant at the first lag over the 
entire sample and for all subsamples, with smaller and sometimes negative 
higher-order antocorrelations. 


2.8.2 Variance Ratios 


The fact that the autocorrelations of daily, weekly, and monthly index re- 
turns in Table 2.4 are positive and often significantly different from zero has 
implications for the behavior of the variance ratios of Section 2.4 and we ex- 
plore these implications in this section for the returns of indexes, portfolios, 
and individual securities. 


CRSP Indexes 
The autocorrelations in Table 2.4 suggest variance ratios greater than one, 
and this is confirmed in Table 2.5 which reports variance ratios VR defined 
in (2.4.37) and, in parentheses, heteroskedasticity-consistent asymptotically 
standard normal test statistics y" (4) defined in (2.4.44), for weekly CRSP 
equal- and valticweighted market return indexes. Panel A contains results 
for the equal-weighted index and panel B contains results for the value- 
weighted index. Within cach panel, the first row presents the variance 
ratios and test statistics for the entire 1,695-week sample and the next two 
rows present similar results for the two subsamples of 848 and 847 weeks. 
Panel A shows that the random walk null hypothesis КМЗ is rejected at 
all the usual significance levels for the entire time period and all subperi- 
ods for the equal-weighted index. Moreover, the rejections are not due to 
changing variances since the * /s are heteroskedasticity consistent. The 
estimates of the variance ratio are larger than onc for all cases. For example, 
the entries in the first cohunn of panel A correspond to variance ratios with 
an aggregation value q of 2. In view of (2.4.18), ratios with 4-2 are approx- 
imately equal to 1 plus the first-order autocorrelation coefficient estimator 
of weekly returns; hence, the entry in the first row, 1.20, implies that the 
first-order autocorrelation for weekly returns is approximately 20%, which 
is consistent with the value reported in Table 2.4. With a corresponding 
у") statistic of 4.53, the random walk hypothesis is resoundingly rejected. 
The subsample results show that although RW3 is easily rejected over 
both halves of the sample period, the variance ratios are slighty larger and 
the rejections slightly stronger over the first half. This pattern is repeated 
in Table 2.6 and in other empirical studies of predictability in US stock 


“Since in ош sample the values of ye" ene omputed under the null hypothesis RW3—-are 
always statistically less significant than the values of V%) calculated under RWI, to conserve 
space we report only the more conservative маймі, 
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Table 2.5. Variance ratios for weekly stock index returns. 


Number Number q of base observations aggregated 
Sample period nq of base | 10 form variance ratio | 


к — 
observations 2 4 8 16 
| 
A. CRSP Equal-Weighted Index | 


62:07:10-94:12:27 1,695 1.20 1.42 1.65 1.74 | 
(4.53)*  (5.80)* (5.84) (4.85) 
G2:07:10—78:10:03 848 1.22 1.47 1.74 - 1.90 , 
(347)*  (444)*  (4.87)*  (4.24)* 
78:10:10-94:12:27 847 1.19 1.35 1.48 1.54 


( 2.96)“ (2.96) * (3.00) 2.55)“ 


B. CRSP Value-Weighted Index 


62:07:10-94:12:27 1,695 1.02 1.02 1.04 1.02 
(0.51) (0.30) (0.41) (0.14) | 

62:07:10—78:10:03 848 1.06 1.08 1.14 1.19 : 
(1.11) (0.89) (1.05) (0.95) $ 

78:10:10-94:12:27 847 0.98 0.97 0.93 0.88 


(-0.45) (-0.40) (0.50) — (—0.64) 


4 


Variance-ratio test of the random walk hypothesis for CRSP equal- and value-weighted indexes, 
for the sample period from July 10, 1962 to December 27, 1994 and subperiods. The variance 
ratios VR(q) are reported in the main rows, with heteroskedasticity-consistent test statistics 
(%) given in parentheses immediately below each main row, Under the random walk null 
hypothesis, the value of the variance ratio is one and the test statistics have a standard normal 
distribution asymptotically. Test statistics marked with asterisks indicate that the corresponding 
variance ratios are statistically different from one at the 5% level of significance. 


returns: the degree of predictability seems to be declining through time. 
To the extent that such predictability has been a source of *excess" profits, 
its decline is consistent with the fact that financial markets have become 
increasingly competitive over the sample period. 
The variance ratios for the equal-weighted index generally increase with 
q: the variance ratio climbs from 1.20 (for g=2) to 1.74 (for q = 16), and 
the subsample results show a similar pattern. To interpret this pattern, 
observe that an analog of (2.4.18) can be derived for ratios of variance 
ratios: 
VR(24) 
VR(q) 


= 14 ),(1) (2.8.1) 


where p,(1) is che firs-order autocorrelation coefficient for q-period returns 
DIT rogis Therefore, the fact that the variance ratios in panel А 
of Table 2.5 are increasing implies positive serial correlation in multiperiod 
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returns, For example, VR(4)/VR(2)=1.42/1.2021,18, which implies that 2. 
week returns have a first-order autocorrelation coefficient of approximateh 
18%. : 

Panel B of Table 2.5 shows that the value weighted index behaves 
differently. Over the entire sample period, the variance ratios are all шеша 
than onc, but not by much, ranging from 1.02 for q—2 to 1.04 for q-:8. 
Morcover, the test statistics y* (4) are all statistically insignificant, hence RW3 
cannot be rejected forany q. The subsample resultsshow that during the first 
half ofthe sample period, the variance ratios for the value-weighted index do 
increase with q (implying positive serial correlation for multiperiod returns), 
but during the second half of the sample, the variance ratios decline with 
q (implying negative serial correlation for multiperiod returns). These two 
opposing patterns are responsible for the relatively stable behavior of the 
variance ratios over the entire sample period. 

Although the test statistics in Table 2.5 are based on nominal stock 
returns, it is apparent that virtually the same results would obtain with real 
or excess returns. Since the volatility of weekly nominal returns is so much 
larger than that of the inflation and Treasury-bill rates, the use of nominal, 
real, or excess returns in volatility-based tests will yield practically identical 
inferences. 


Size-Sorled Portfolios 
The fact that RW3 is rejected by the equal-weighted index but not by the 
value-weighted index suggests that market capitalization or size may play a 
role in the behavior of the variance ratios. To obtain a better sense of this 
intuition, Table 9.6 presents variance ratios for the returns of size-sorted 
portfolios. We compute weekly returns for five size-sorted portfolios from 
the CRSP NYSE-AMEX daily returns file. Stocks with returns for any given 
week 4ге assigned to portfolios based on which quintile their beginning- 
of-year-market capitalization belongs to. The portfolios are equal-weighted 
and have a changing composition.*” Panel A of Table 2.6 reports the results 
for the portfolio of small firms (first quintile), pancl B reports the results 
for the, portfolio of medium-size firms (third quintile), and panel C reports 
the results for the portfolio of large firms (fifth quintile). 

Evidence against the random walk hypothesis for the portfolio of com- 
panies in the smallest quintile is strong for the entire sample and for both 
subsaniples: in panel A all the МФ) statistics are well above the 5% critical 
value of 1.96, ranging from 4.67 t0 10.74. The variance ratios are all greater 

t 
“We ako performed our tests Using value-weighted portfolios and obtained essentially 
the same results, The only diflerence appears im the largest quiutile of the value-weighted 
portfolio; lor which the random walk hypothesis was generally not rejected. 


is not sugprising, given that the largest value-weighted quintile is quite simil 
l А 
weiphtedimairket index, 


This, of course, 
ar to the value- 
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Table 2.6. Variance ratios for weekly size-vnted portfolio returns. 


Number Number q of base observations aggregated 
Tune period nq of base to form variance ratio 
observations 2 4 8 16 


A. Portfolio of firms with market values in smallest CRSP quintile 


62:07:10-04:12:27 1,695 1.35 1.77 2.24 2.46 
(7.15)* (9.42) (10.74) — (9.33)* 

69:07: 0- 78:10:03 848 1.34 1.76 2.22 2.46 
(5.47) (7.33) (8.03) (6.97) 

78:10:10-94:12:27 847 1.37 1.79 2.22 2.40 


(4.67) * (5.91) * (6.89) * (6.60) * 


B. Portfolio of firms with market values in central CRSP quintile 


62:07:10—94:12:27 1,695 1.20 1.39 1.59 1.65 
1 (4.25)* (4.85)* (5.16)* (4.17)* 

62:07:10—78:10:03 848 1.21 1.43 1.66 1.79 
(3.25)* (4.03)* (4.27)* (3.67)* 

78:10:10—94:12:27 847 1.19 1.33 1.44 1.47 


(2.79)* (2.74)* (2.63)* (2.14)* 


C. Portfolio of firms with market values in largest CRSP quintile 


62:07:10—04:12:27 1,695 1.00 1.10 1.14 1.11 
(1.71) (1.46) (1.38) (0.76) 

62:07: 10-78: 10:03 848 1.11 1.21 1.30 1.32 
(2.05)* (2.15)* (2.12)* (1.59) 

78:10:10—04:1 2:27 847 1.01 1.00 0.98 0.92 


(0.29)* (0.05) (0.13) (0.41) 


Vanance-ratio test of the random walk hypothesis for sizesorted portfolios, for the sample 
period trom July 10, 1962 to December 27, 1994, and subperiods. The variance ratios VR(q) 
are reported in the main rows, with heteroskedasticity-consistent test statistics W°(q) given in 
parentheses immediately below each main vow. Under the random walk null hypothesis, the 
value of the variance ratio is one and the test statistics have a standard normal distribution 
asymptotically. Test statistics marked with asterisks indicate that the corresponding variance 
ratios are statistically different from one at the 5% level of significance. 


than onc, implying a first-order autocorrelation of 35% for weekly returns 
over the entire sample period. 

For the portfolios of medium-size companies, the „, (% statistics in 
panel B shows that there is also strong evidence against RW3, although 
the variance ratios are smaller now, implying lower serial correlation. For 
the portfolio of the largest firms, panel € shows that evidence against RW3 
is sparse, hited only to the first half ol the sample period. 


The results for size-based portfolios are generally consistent with those 
for the market indexes: variance ratios are generally greater than one and 
increasing in J. implying positive serial correlation in multiperiod returns, 
statistically significant for portfolios of all but the largest companies, and 
more significant during the first half of the sample period than the second 
half. 


Individual Securities 

Having shown that ihe random walk hypothesis is inconsistent with the be- 
havior of the equal weighted index and portfolios ofsmall- and medium-size 
companies, we now turn to the case of individual security returns. Table 2.7 
reports (he cross-sectional average of the variance ratios of individual stocks 
that have complete return histories in the CRSP database for our entire 
1,695-week sample period, a sample of 411 companies. Panel A contains the 
cross-sectional average of the variance ratios of the 411 stocks, as well as of 
the 100 smallest, 100 intermediate, and 100 largest stocks." Cross-sectional 
standard deviations are given in parentheses below the main rows, Since the 
variance ratios are clearly not cross-sectionally independent, these standard 
deviations cannot be used to form the usual tests of significance—thiey are 
reported only 10 provide some indication of the cross-sectional dispersion 
of the variance ratios, 

The average variance ratio with q=2 is 0.96 for the 4131 individual secu- 
rides, implying that there is negative serial correlation on average. For all 
stocks, the average serial correlation is 2496, and —596 for the smallest 100 
stocks, However, the serial correlation is both statistically and economically 
insignificant and provides little evidence against the random walk lvpoth- 
esis. For example, the largest average y*(q) statistic over all stocks occurs 
for q=4 and is —0.90 (with a cross-sectional standard deviation of 1.19); the 
largest average y*(q) for the 100 smallest stocks is —1.67 (for q=2, with a 
cross-sectional standard deviation of 1.75). These results are consistent with 
French and КОШ (1986) finding that daily returns of individual securities 
are slightly negatively autocorrelated, 

For comparison, panel B reports the variance ratio of equal- and value- 
weighted portfolios of the 411 securities. The results are consistent with 
those in Tables 2.5 and 2.6: significant positive autocorrelation for the equal- 
weighted portfolio, audiutocoreelatiou clase to zero for the value-weighted 
portfolio. 

That the returns of individual securities have statistically insignificant au- 
tocorrelation is not surprising. Individual returns contain much company- 
specific or iliosyneratie noise ihat makes it difficult to detect the presence of 
predictable components, Since the idiosyncratic noise ts largely attenuated 


Mid-sample uit ket values iie used as the size measure, 


Table 2.7. Variance ratios for weekly individual security returns. 


Number 4 of base observations aggregated 


. Number 
Sample ng of base to form variance ratio 
observations 9 4 8 16 


A. Averages of variance ratios over individual securities 


All stocks 1,695 0.96 0.92 0.89 0.85 
(411 stocks) (0.04) (0.07) (0.11) (0.14) 
Small stocks 1,695 0.95 0.90 0.88 0.85 
(100 stocks) (0.06) (0.09) (0.12) (0.15) 
Medium stocks 1,695 0.96 0.93 0.90 0.85 
(100 stocks) (0.04) (0.07) (0.09) (0.13) 
Large stocks 1,695 0.95 0.91 0.89 0.86 
(100 stocks) (0.03) (0.06) (0.11) (0.15) 
B. Variance ratios of equal- and value-weighted portfolios of all stocks 
Equal-weighted portfolio 1,695 1.11 1.20 1.30 1.29 
(411 stocks) (2.75)* (2.83)* (2.88)* (1.99)» 
Value-weighted portfolio 1,695 0.99 0.97 0.96 0.93 
(411 stocks) (-0.26) (0.43) (-0.42) (—0.53) 


ж M — ———————————M 


Means of variance ratios over all individual securities with complete return histories during the sample period from July 10. 
1962 to December 27, 1994 (411 stocks). Means of variance ratios for the smallest 100 stocks, the intermediate 100 stocks, 
and the largest 100 stocks are also reported. For purposes of comparison, panel B reports the variance raiios for equal- and 
value-weighted portfolios, respectively, of the 411 stocks. Parenthetical entries for averages of individual securities (panel A) 
are standard deviations of the cross section of variance ratios. Because the variance ratos are not cross-sectionally indepen- 
dent, the standard deviaüon cannot be used to perform the usual significance tests; they are reported only to provide an 
indication of the variance ratios’ cross-secuonal dispersion. Parentheüca! entries for portfolio variance ratios (panel B) are 
the heteroskedasticity-consistent у" (q) sdffisdcs. Asterisks indicate variance ratios that are stausdcally different from! at the 
5% level of significance. 
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by forming portfolios, we would expect to uncover the predictable systematic 
component more readily when securities are combined. Nevertheless, the 
weak negative autocorrelations of the individual securities are an interesting 
contrast to the stronger positive autocorrelation of the portfolio returns. 


2.8.3 Cross-Autocorelations and Lend-Lag Relations 


Despite the fact that individual security returns are weakly negatively au- 
tocorrelated, portfolio returns—which are essentially averages of individ- 
ual security returns—are strongly positively autocorrelated. This somewhat 
paradoxical result can mean only one thing: large positive cross-autocor- 
relations across individual securities across time. 

To see this, consider a collection of N securities and denote by R, the 
(N x1) vector of their period- simple returns [Rte s Rye). We switch to 
simple returns here because the focus of our analysis is on the interaction 
of returns within portfolios, and continuously compounded returns do not 
aggregate across securities (see Section 1.4.1 in Chapter 1 foc further discus- 


sion). For convenience, we maintain the following assumption throughout 
this PUE 


(A1) Riisa jointly covariance-stationary stochastic process with expectation EI R. 
= wis [Hi ue c UN ]' and autocovariance matrices E{ (R- ~ р) (R. — 
HY] = TG where, with no loss of generality, we lake k>0 since Г(д) = ГА), 


If is defined to be a vector of ones [le L], we can express the equal- 
weighted market index as Rm m „R/ N. The first-order autocovariance 
of Rm Imay then be decomposed into the sum of the first-order own-autoco- 
ad a and cross-autocovariances of the component securities: 


R. UR 70 
Covl Hasi, Ra} = c . . Е (2.8.2) 


N М? OC 


t + . * 
and therefore the first-order autocorrelation of ft, can be expressed as 
{ 


Соур y, en. = „T () 8 “Таа В u(T(1) (2.8.4) 
Varl Rar] © Гу. = VrO TERR 


eT Oe 7 


Й 


where tr(-) is the trace operator which sums the diagonal entries of its square- 
matrix argument. The first term of the right side of (2.8.3) contains only 


*. 


"Assumption (AT) is made tor notational simplicity, since joint covariancesstationarity al- 
lows us to eliminate time-indexes trom population moments such as д and ГА); the qualitative 
features of our results will not change under the weaker assumptions of weakly dependent 
heterogeneously distributed vectors R. This would merely require replacing expectations 
with corresponding probability limits of suitably defined tune-averages. See Lo and MacKinlay 
(1990€) for further details. 
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Table 2.8. Caoss-nutororielation matrices for size-saited portfolio returns. 


Lm Ro, ft, Ry Lm 
Ru 1.000 9.938 0.892 0.839 0.728 
= Ry 0.938 1.000 0.976 0.044 0.856 
To = Жу E 0.976 1.000 0.979 zi 
Ry, 0.839 0.944 0.979 1.000 9.961 
Re, 0.728 0.856 0.914 9.961 1.000 
Ru Ry Ih, Ry It, 
10 (0.352 0.226 0.171 0.115 0.024 
a Rye { 0.330 0.232 0.182 0.129 0.037 
Y, = Ru- | 0.324 0.244 0.197 0.147 0.053 
Rua | 0.810 0.242 0.201 0.153 0.059 
Кы-у \ 0.265 0.223 0.187 0.147 0.057 
ШТ Ry Ry Ray Ru 
12 (0.163 0.089 0.057 0.032 —0.010 
"A Rag | 0.141 0.078 0.051 0.029  —0.010 
Y. = Rue [s 0.079 0.051 0.032 E] 
107 | 0.121 0.071 0.046 0.028 | —0.006 
I \ 0.084 0.045 0.025 6.012 —0.016 
Ru [i7 Ry Rar Ru 
Ru- / 0.155 0.106 0.074 0.050 0.027 
if Rares [s 0.100 0.071 0.050 A ! 
Y. = Ra | 0.143 0.105 0.077 0.058 0.039 
из | 0.137 0.104 0.079 0.061 0.044 
I- \ 0.120 0.093 0.074 0.061 0.047 
Ry Rey Ruy fü, кы 
Rua / 0.104 0.063 0.036 0.016 | —0.007 
2 Г.-а | 0.097 0.062 0.036 0.007  —0.006 
Y. = Rss | 0.095 0.000 0.033 0.015  —Q.011 
J | 0.100 0.067 0.034 0.023  —0.004 
1, \ 0.094 0.064 0.038 0.025 —0.001 
Autocorrelation inatrices of the vector X; = | Ru Ru Ru Ryu Key | where Ry is the week- 


t return on the equal-weighted portfolio of stocks in the ah quintile, i- I.... . 5 (quintile 
1 contains the smallest stocks), for the sample of NYSE-AMEX stocks trom July 10, 1962 to 
December 27, 1994 (1,695 observations). Note that Y (h) = DVEX- pX uy DY 
where D = diag(o? و‎ o): thus the (i. th element is the correlation between Rz and 
Ky. Asymptotic standard errors for the autocorrelations under an HD null hypothesis are given 


bv l/ JF = 0.024. 


cross-autocovariances and the second term only the own-autocovariances. If 
the own-autocovariauces are gencrally negative, and index autocovariance 
is positive, then the cross-autocovariances must be positive. Morcover, the 
cross-Autocovariances must be large, so large as to exceed the sum of the 
negative own-autocovariances. , 


Table 2.8 reports autocorrelation matrices YO) of the vector of weekly 
returns of five size-sorted portfolios, formed from the sample of stocks using 
weekly returns from July 10, 1962, to December 27, 1994 (1,695 observa- 
tions). Let X, denote the vector | Ru Ry Ry Ry Ra |, where Ry is the 
return on the equal-weighted portfolio ofstocks in the ith quintile. Then the 
Ath order autocorrelation matrix of X, is given by Y(4) = D EA — 
BXX, — )'] D-, where D = diag(of. . .. a2) and и = E[X,]. By this 


convention, the j, Jh element of Y (4) is the correlation of A, , with Ry. 


The estimator Y) is the usual sample autocorrclation matrix. 

An interesting pattern emerges from Table 2.8: The entries below the di- 
agonals of T () are almost always larger than those above the diagonals. For 
example, the first-order autocorrelation between last week's return on large 
stocks (Жыш) with this week's return on small stocks (J,) is 26.5%, whereas 
the first-order autocorrelation between last week's return on small stocks 
(Ria) with this week's return on large stocks (Rz) is only 2.4%. Similar 
patterns may be seen in the higher-order autocorrelation matrices, although 
the magnitudes are smaller since the higher-order cross-autocorrelations de- 
сау. The asymmetry of the T (4) matrices implies that the autocovariance 
matrix estimators Р) are also asymmetric, 

This intriguing lead-lag pattern, where larger capitalization stocks lead 
and smaller capitalization stocks lag, is more apparent in Table 2.9 which 
reports the difference of the autocorrelation matrices and their transposes, 
Every lower-diagonal entry is positive (hence every upperliagonal entry is 
negative), implying that the correlation between current returns of smaller 
stocks and past returns of larger stocks is always larger than the corre'ation 
between current returns of larger stocks and past returns of smaller stocks, 

Of course, the nontrading model of Chapter 3 also yields an aszim- 
metric autocorrelation matrix. However, we shall sce in that chapter that 
unrealistically high probabilities of nontrading are required to generate 
crossautocorrelations of the magnitude reported in Table 2.8. 

The results in Tables 2.8 and 2.9 point to the complex patterns of cross- 
effects among securities as significant sources of positive index autocorre- 
lation, Indeed, Lo and MacKinlay (1990c) show that over half of the posi- 
tive index autocorrelation is attributable to positive cross-effects. They also 
observe that positive cross-elfects сап explain the apparent profitability of 
contrarian investment strategies, strategies that are contrary to the general 
market direction, These Strategies, predicated on the notion that Investors 
tend to overieact to information, consist of selling “winners” and buying 
"losers." Selling the winners and buying the losers will earn positive ex- 
pected profits in the presence o negative serial correlation because current 
losers are likely to become future winners and current winners are likely to 
become future losers, 


Table 2.9. Asymmetry of cross-autocorrelation matrices. 


R n, Ry К, Rs | 

Кү /0.000 -0.104 —0153 —0.195 —0.241 

d PT Ry | 0.104 0.000 -0.061 —0.113 —0.181 
Ya- Yay = Ку] 0.153 0.061 0.000 —0.054 —0.134 
Ry 0.195 0.113 0.054 0.000 —0.088 

R, \ 0.241 0.181 0.134 0.088 0.000 


Кү Ry Ry R, Rs 


0.084 0.050 0.023 0.000 —0.030 
0.102 0.070 0.049 0.030 0.000 


Ку 7 0.000 —0.052 -0.079 —0.089 -0.094 
Р Da n [s 0.000 —0.029 —0.042 E] 
YD- = R | 0.079 0029 0.000 —0.014 —0.029 
К, | 0.089 0.042 0.014 0.000 —0.018 
R, \0.094 0.055 0.029 0.018 0.000 
p? R Ry R, Rs 
R, (0.000 —0.035 -0.069 —0.087 —0.093 
3 R, | 0.035 0.000 —0.024 —0.054 -—0.062 
Y(3)- YT (3) = R | 0.069 0.034 0.000 —0.022 -0.035 
R, | 0.087 0.054 0.022 0.000 —0.018 
R, 0.003 0.062 0.035 0.018 0.000 
f Ry Ry R, Rs 
I, (0.000 —0.033 -0.059 —0.084 —0.102 
2 p» Ry | 0.033 0.000 —0.024 —0.050 —0.070 
1 (4) - T (4) = f | 0.059 0.024 0.000 —0.023 -0.049 


Differences between autocorrelation matrices and their transposes for the vector of size- 
sorted portfolio returns X, = { Ru Ru Аз Rae Аы ]' where R, is the week-t return on 
the equal-weighted portfolio of stocks in the ith quintile, i—1,...,5 (quintile ] contains the 
smállest stocks), for the sample of NYSE-AMEX stocks from July 10, 1962 to December 27, 
1994 (1,699 observations). Note that (А) = D-'ZE((Xi., — "JK, — uy ]D^"7*, where 
D = diag[o? VE оу]. 


But the presence of positive cross-effects provides another channel 
through which contrarian strategies can be profitable. If, for example, a 
high return for security A today implies that security B's return will probably 
be high tomorrow, then a contrarian investment strategy will be profitable 
even if each security’s returns are unforecastable using past returns of that 
security alone. To sec how, suppose the market consists of only the two 
stocks, A and B; if A's return is higher than the market today, a contrar- 
ian sells it and buys B. But if A and B are positively cross-autocorrelated, à 
higher return for A today implies a higher return for B tomorrow on aver- 
age, and thus the contrarian will have profited from his long position in B 
on average. 
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Nowhere is it required that the stock market Overreacts, i. e., that indi- 
vidual returns are negatively autocorrelated. Therefore, the fact that some 
contrarian strategies have positive expected profits need not imply stock 
market overreaction, In fact, for the particular contrarian Strategy that Lo 
and MacKinlay (1990c) examine, over half of the expected profits is due 
to cross-effects and not to negative autocorrelation in individual security 
returns, 

These cross-effects may also explain the apparent profitability of several 
other trading Strategies that have recently become popular in the financial 
community. For example, long/short or market-neutral strategies in which 
lony positions are offset dollar-for-dotlar by short positions ean earn superior 
retfrns in exactly the fashion described above, despite the fact that they are 
designed to take advantage ofown-effects, ie., positive and negative forecasts 
of individual Securities’ expected returns. The performance of matched-book 
or pairs wading Strategies can also be attributed to Cross-effects as well as 
own-etfects, ` 

Although several studies have attempted to explain these striking lead- 
lag dffects (see, for example, Badrinath, Kale, and Noe [1995], Boudoukh, 
Richardson, and Whitelaw [1994], Jegadeesh and Swaminathan [1993], 
Conrad, Kaul, and Nimalendran [1991], Brennan, Jegadeesh, and Swami- 
nathnn [1993], Jegadeesh and Titman [1995], and Mech [1993]), we are 
stil! für from having a complete understanding of their nature and sources, 


| 


| 
| 
| 2.8.4 Tests Using Long Погон Returns 


Several recent studies have employed longer-horizon returns—multi-year re- 
turns in most cases—in examining the random walk hypothesis, predictabil- 
ity, and the profitability of contrarian Strategies, with some surprising results, 
Distinguishing between short and long veturn-horizons can be important be- 
Cause it is now well known that weekly fluctuations in stock returns differ 
in many ways from movements in three- to five-year returns, We consider 
the econometric trade-offs between short- and long-horizon returns in more 
detail in Chapter 7, and provide only a brief discussion here of the long- 
horizon implications for the random walk hypotheses, 

In contrast to the positive serial correlation in daily, weekly, and monthly 
index returiis documented by Lo and MacKinlay (1988) and others, Fama 
and French (1988b) and Poterba and Summers (1988) find negative scrial 
correlation in multi-year index returns, For example, Poterba and Sum- 
mers (1988) report à variance ratio of 0.575 for 96-month returns of the 
value-weighted CRSP NYSE index from 1996 to 1985, implying negative se- 
rial correlation at some return horizons (recall that the variance ratio is a 
specific linear combination of autocorrelation coefficients). Both Fama and 
French (1988b) and Poterba and Summers (1988) conclude that there is 
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substantial mean-reversion in stock market prices at longer horizons, which 
they attribute to the presence of a “transitory” component such as the » 
component in (2.5.1), 

There is, however, good reason to be wary of such inferences when they 
are based on long-horizon returns. Perhaps the most obvious concern is the 
extremely small sample size: From 1996 to 1985, there are only 12 nonover- 
lapping five-year returns, While overlapping returns do provide some incre- 
mental information, the results in Boudoukh and Richardson (1994), Lo 
and MacKinlay (1989), Richardson and Smith (1991), and Richardson and 
Stock (1989) suggest that this increment is modest at best and misleading 
at worst. In particular, Richardson and Stock (1989) propose an asymptotic 
approximation which captures the spirit of overlapping long-horizon return 
calculations—they allow the return horizon q to increase with the sample 
size T so that q/ T converges to a finite value 8 between zero and опе 
which shows that variance ratios can be severely biased when the return 
horizon is a significant fraction of the total sample period. For example, 
using their asymptotic approximation (2.5.10), discussed in Section 2.5.1, 
the expected value for the variance ratio with overlapping returns is given 
by (2.5.12) under RWI. This expression implies that with a return horizon 
of 96 months and a sample period of 60 years, 6=8/60=0.133 hence the ex- 
pected variance ratio is (1—8)2=0.751, despite the fact that RWI is assumed 
to nold. Under RW2 and RWS, even more dramatic biases can occur (see, 
for example, Romano and Thombs [1996]). 

These difficulties are reflected in the magnitudes of the standard errors 
associated. with long-horizon return autocorrclations and variance ratios 
(sce, for example, Richardson and Stock (1989, Table 5), which are typically 
30 large as to yield zstatistics close to zero regardless of the point estimates. 
Richardson (1993) and Richardson and Stock (1989) show that properly 
adjusting for the small sample sizes, and for other statistical issues associated 
with long-horizon returns, reverses many of the inferences of Fama and 
French (1988b) and Poterba and Summers (1988). 

Morcover, the point estimates of autocorrelation coefficients and other 
time series parameters tend to exhibit considerable sampling variation for 
long-horizon returns. For example, simple bias adjustments can change 
the signs of the autocorrelations, as Kim, Nelson, and Startz (1988) and 
Richardson and Stock (1989) demonstrate. This is not surprising given the 
extremely small sample sizes that long-horizon returns produce (see, for 
cxample, the magnitude of the bias adjustments in Section 2.4.1). 

Finally, Kim, Nelson, and Startz (1988) show that the negative serial cor- 
relation in long-horizon returns is extremely sensitive to the sample period 
and may be largely due to the first ten years of the 1926 to 1985 sample. 
Although ten years is a very significant portion of the data and cannot be 
excluded without careful consideration, nevertheless it is troubling that the 
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sign of the serial correlation coefficient hinges on data from the Great De- 
pression. This conundrum—whether to omit data influenced bv a single 
cataclysmic event, or to include it and argue that such an event is repre- 
sentative of the economic system—underscores the fragility of small-sample 
Statistical inference, Overall, there is little evidence for mean reversion in 
long-horizon returns, though this тау be more of a Symptom of small sam- 
ple sizes rather than conclusive evidence against mean reversion—we simply 
cannot tell. 

These considerations point to short-horizon returns as the more imme- 
diate source from which evidence of predictability might be culled. This is 
not 10 say that à careful investigation of returns over longer time spans will 
be uninformative. Indeed, it may be only at these lower frequencies that the 
impact of economic factors such as the business cycle is detectable. More- 
over, to the extent that transaction costs are greater for strategies exploiting 
short-horizon predictability, long-horizon predictability may be a more gen- 
uine form of unexploited profit opportunity. Nevertheless, the econometric 
challenges posed by long-horizon returns are considerable, and the need 
for additional economie structure is particularly great in such cases, 


2.9 Conclusion 


Recent econometric advances and empirical evidence seem to suggest that 
financial asset returns are predictable to some degree, Thirty years ago this 
would have been Fantamount to an outright rejection of market efficiency, 
However, modern financial economics teaches us that other, perfectly ra- 
tional, factors may account for such predictability. The fine structure of 
Securities markets and frictions in the trading process can generate pre- 
dictability, Time-varying expected returns due to changing business condi- 
tions can generate predictability. A certain degree of predictability may be 
necessary to reward investors for hearing certain dynamic risks, Motivated 
by these considerations, we shall develop many models and techniques to 
address these and other related issues in the coming chapters. 


Problems—Chapter 2 


2.4 {P hisa martingale, show that: (1) the minimum Mean-squared error 
forecast of Рат, conditioned on the entire history (5^, 17 1. . . J. is simply 
(2) nonoverlapping kth differences are uncorrelated at all leads and lags 
lor all, > 0. 


2.2 lower the RWI, RW. RWS, and martingale hypotheses related (in- 
chide a Venn diagram to illustrate the relations among the four models); 
Provide specific examples of cach, 


ones i sl 


2.3 Characterize the set of all two-state Markov chains (2.2.9) that do not 
satisfy RWI and for which the CJ statistic is one. What are the general prop- 


erties of such Markov chains, e.g., do they generate sequences, reversals, 
etc.? 


2.4 Derive (2.4.19) for processes with stationary increments, Why do the 
weights decline linearly? Using this expression, construct examplesiof non- 
random-walk processes for which the variance ratio test has very low power. 


2.5 Using daily and monthly returns data for ten individual stocks and the 
equal- and value-weighted CRSP market indexes (EWRETD and VWRETD), 
perform the following statistical analysis using any statistical package of your 
choice. Note that some of the stocks do not have complete return histories, 
so be sure to use only valid observations. Also, for subsample analyses, split 
the available observations into equal subsamples. b 


2.5.1 Compute the sample mean д, standard deviation G, and first-order 
autocorrelation coefficient 2(1) for daily simple returns over the entire 
1962 to 1994 sample period for the ten stocks and the two indexes. Split 
the sample into four equal subperiods and compute the same statistics in 
each subperiod—are they stable over time? 


2.5.0 Compute the sample mean t, standard deviation û, and first-order 
autocorrelation coefficient 6(1) for continuously compounded daily re- 
turns over the entire 1962 to 1994 period, and for each of the four equal 
subperiods. Compare these to the results for simple returns—can con- 
tinuous compounding change inferences substantially? 


2.5.8 Plot histograms of daily simple returns for VWRETD and EWRETD 
over the entire 1962 to 1994 sample period. Plot another histograin 
of the normal distribution with mean and variance equal to the sample 
mean and variance of the returns plotted in the first histograms. Do daily 
simple returns look approximately normal? Which looks closer to nor- 
mal: VWRETD or EWRETD? Perforin the same analysis for continuously 


compounded daily returns and compare these results to those for simple 
returns. 


a 


2.5.4 Using daily simple returns for the entire 1962 to 1994 sample pe- 
riod, construct 99% confidence intervals for 2 for VWRETD, EWRETD, 
and the ten individual stock return series. Divide the sample into four 
equal subperiods and construct 99% confidence intervals in each of the 
four subperiods for the twelve series—do they shift a great deal? 


2.5.5 Compute the skewness, kurtosis, and studentized range of daily 
simple returns of VWRETD, EWRETD, and the ten individual stocks over 
the entire 1962 to 1994 sample period, and in each of the four equal 
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subperiods. Which of the skewness, kurtosis, and studentized range esti- 
Mates are statistically different from the skewness, kurtosis, and stude 

ued range ofa normal random variable at the 5% level? For these twelve 
serics, perform the same calculations using monthly data. What do you 
conclude about the normality of these return series, and why? 


n- 


Market Microstructure 


WIGLE IT IS ALWAYS the case that some features of the data will be lost in the 
process of modeling economic phenomena, determining which features 
to focus on requires some care and judgment. In exploring the dynamic 
properties of financial asset prices in Chapter 2, we have taken prices and 
returns as the principal objects of interest without explicit reference to the 
institutional structures in which they are determined. We have ignored 
the fact that security prices are generally denominated in fixed increments, 
typically eighths of a dollar or ticks for stock prices. Also, securities do not 
trade at evenly spaced intervals throughout the day, and on some days they 
do not trade at all. Indeed, the very process of trading can have an important 
impact on the statistical properties of financial asset prices: In markets with 
designated marketmakers, the existence of a spread between the price at 
which the marketmaker is willing to buy (the bid price) and the price at 
which the markeumaker is willing to sell (the offer ог ask price) can have a 
nontrivial impact on the serial correlation of price changes. 

For some purposes, such aspects of the market's microstructure can be 
safely ignored, particularly when longer investment horizons are involved. 
For example, it is unlikely that bid-ask bounce (to be defined in Section 
3.2) is responsible for the negative autocorrelation in the five-year returns 
of US stock indexes such as the Standard and Poor's 500,! even though 
the existeuce of a bid-ask spread does induce negative autocorrelation in 
returns (see Section 3.2.1). 

However, for other purposes—the measurement of execution costs and 
market liquidity, the comparison of alternative marketmaking mechanisms, 
the impact of competition and the potential for collusion among market- 
makers—market microstructure is central. Indeed, market microstructure 
is now one of the most active research areas in economics and finance, span- 


‘See Section 2.5 in Chapter 2 and Section 7.2.1 in Chapter 7 lor further discussion of 
long-horizon returns. 
^ 
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ning many markets and many models.” To test some of these models, and 
to determine the importance of market microstructure effects for other re- 
search areas, we require some empirical measures of market microstructure 
effects. We shall construct such measures in this chapter. 

In Section 3, l, we presenta simple model of the trading process to cap- 
ture the effects of nonsynchronous trading. In Section 3.2, we consider the 
effects of the bid-ask spread on the time-series properties of price changes, 
and in Section 3,3 we explore several techniques for modeling transactions 
data which pose several unique challenges including price discreteness and 
irregular sampling intervals, 


3.1 Nonsynchronous Trading 


The nonsynchronous trading or nontrading effect arises when time series, usu- 
ally asset prices, are taken to be recorded at time intervals of one length 
when in fact they are recorded at time intervals of other, possibly irregular, 
lengths. For example, the daily prices of securities quoted in the financial 
press are usually closing prices, prices at which the last transaction in each 
of those securities occurred on the previous business day. These closing 
prices generally do not occur at the same time each day, but by referring 
to them as "daily" prices, we have implicitly and incorrectly assumed that 
they are equally spaced at 24-hour intervals. As we shall see below, such an 
assumption can create a false impression of predictability in price changes 
and returns even if true price changes or returns are statistically indepen- 
dent, 

In particular, the nontrading effect induces potentially serious biascs 
in the moments and moments of asset returns such as their means, vari- 
ances, covariances, betas, and autocorrelation and cross-autocorrelation co- 
efficients. For example, suppose that the returns to stocks A and B are 
temporally independent but A trades less frequently than B. If news affect- 
ing the aggregate stock market arrives near the close of the market on one 
day, it is more likely that B's end-of-day price will reflect this information 
than A's, simply because A may not trade after the news arrives. Of course, 
A will respond to this information eventually but the fact that it responds 
with a lag induces spurious crossuttocorrelation between the daily returns 
of A and B when calculated with closing prices. "This lagged response will 


“The шешине is far foo Vit fo give a complete citation list here. In addition to the 
Citations listed in cach of the sections below, readers interested in an introduction to market 
microstiacture uie enc ouragedto consult the following excellent monographs and conference 
volumes that, together, provide it fairly complete treatment of the major issues and models in 
this liter atte: Cohen, Maier, Schwanz and Whitcomb (186), Davis and Holt (1993), Frankel, 


Galli, and Giovannini (Itn), Ravel and. Roth (1995), Lo (1995), O'Hara (1995), and SEC 
(1004), 
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also induce spurious own-autocorrelation in the daily returns of A: During 
periods of nontrading, A's observed return is zero and when A does trade, 
its observed return reverts to the cumulated mean return, and this mean- 
reversion creates negative serial correlation in A's returns. These effects 
have obvious implications for tests of predictability and nonlinearity in asset 
returns (see Chapters 2 and 12), as well as for quantifying the trade-offs 
between risk and expected return (see Chapters 4—6). 

Perhaps the first to recognize the importance of nonsynchronous prices 
was Fisher (1966). More recently, explicit models of nontrading have been 
developed by Atchison, Butler, and Simonds (1987), Cohen, Maier, Schwartz, 
and Whitcomb (1978, 1979), Cohen, Hawawini, Maier, Schwartz, and Whit- 
comb (1983b), Dimson (1979), Lo and MacKinlay (1988, 1990a, 1990c), 
and Scholes and Williams (1977). Whereas earlier studies considered the 
effects of nontrading on empirical applications of the Capital Asset Pricing 
Model and the Arbitrage Pricing Theory? more recent attention has been 
focused on spurious autocorrelations induced by nonsynchronous trading.‘ 
Although the various models of nontrading may differ in their specifics, they 
all have the common theme of modeling the behavior of asset returns that 


are mistakenly assumed to be measured at evenly spaced time intervals when 
in fact they are not. 


3.1.1 A Model of Nonsynchronous Trading 


Since most empirical i investigations of stock price behavior focus on returns 
or price changes, we take as primitive the (unobservable) return-generating 
process of a collection of N securities. To capture the effects of nontrad- 
ing, we shall follow the nonsynchronous trading model of Lo and MacKinlay 
({990a) which associates with each security i in each period t an unobserved 
or virtual continuously compounded return rj. These virtual returns rep- 
resent changes in the underlying value of the security in the absence of any 
trading frictions or other institutional rigidities. They reflect both company- 
specific information and economy-wide effects, and in a frictionless market 
these returns would be identical to the observed returns of the security. 

To model the nontrading phenomenon as a purely spurious statistical 
artifact not an economic phenomenon motivated by private information 
and strategic considerations—suppose in each period t there is some proba- 
bility л; that security i does not trade and whether the security trades or not 
is independent of the virtual returns (ri) (and all other random variables 


i 


See, for example, Cohen, Hawawini, Maier, Schwartz, and Whitcomb (19832, b), Dimson 
(1979), Scholes and Williams (1977), and Shanken (1987b). 

4See Atchison, Butler, and Simonds (1987), Cohen, Maier, Schwartz, and Whitcomb (1979, 
1986), and Lo and MacKinlay (1988, 1988b, 1990a, 1990c). 
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in this model). Therefore, this nontrading process can be viewed as an 
IID sequence of coin tosses, with different nontrading probabilities across 
S nee By allowing cross-sectional differences in the random nonti 
ing processes, we shall be able to capture the effects of nontrading c 
returns of portfolios of securities. 

1 55 observed return of security 7, rf, depends on whether security i trades 
in period & If security i does not trade in period t, let its observed return be 
zer if no trades occur, then the closing price isset to the previous period's 
closing price, and hence r5 = log(pu/Pu-1) = log = 0. If, on the other 
hand, security i does trade in period 4, let its observed return be the sum of 
the; irtual returns in period t and in all prior consecutive periods in which ! 
did not trade. 

'For example, consider a sequence of five consecutive periods in which 
security i trades in periods 1, 2, and 5, and does not trade in periods 3 and 
4. The above nontrading mechanism implies that: the observed return in 
period 2 is simply the virtual return ( = тю); the observed returns in period 
3 and 4 are both zero (4 = тд = 0); and the observed return in period 5 
is the sum of the virtual Senis from periods 3 to 5 (rz = rs + r4 + ns). 
This captures the essential feature of nontrading as a source of spurious 
autocorrelation: News affects those stocks that trade more frequently first 
and influences the returns of morc thinly traded securities with a lag. In this 
framework the impact of news on returns is captured by the virtual returns 
process and the impact of ше lag induced by nontrading is captured by the 
observed returns process r, 

To complete thc sped of this nontrading model, suppose that 
virtual returns are governed by a one-factor linear model: 


foc Mit В. + En i= .. . N (3.1.1) 


where fi is some zero-mean common factor and €y is zero-mean idiosyncratic 
noise that is temporally and cross-sectionally independent at all leads and 
lags. Since we wish to focus on nontrading as the sole source of autocorrela- 
tion, we also assume that the conunon factor f, is HD and is independent of 


“The case where trading is correlated with virtual returns is not without interest, but it is 
inconsistent with the spirit of the nontrading as a kind of measurement error, In the presence 
of private information and strategic behavior, trading activity does typically depend on vit tual 
returns (suitably defined), and strategic trading can induce serial correlation in observed 
returns, but such correlation can hardly be dismissed as “spurious”. Sce Section 3.1.2 tor 
further discussion. 

"This assumption may be relaxed to allow for state-dependent probabilities, Le, autocot- 
relatedyprontrading; sce the discussion in Section 3.1.2. 

TPdriod 1's return obviously depends on how many consecutive periods prior to period I 
that the security did not trade. Uf it traded in period 0, then the period-1 return is simply equal 
to its virtual return, if it did not trade in period 0 but did wade in period —1, then period Us 
observer return is the sum of period 0's and period Us virtual returns, etc. 
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% for all J, t, and kë Each period's virtual return is random and captures 
movements caused by information arrival as well as idiosyncratic noisc. The 
particular noutiading and return-cumulation process we assume captures 
the lag with which news and noise is incorporated into security prices due to 
infrequent trading. The dynamics of such a stylized model are surprisingly 
rich, and they yield several important empirical implications. 

To derive an explicit expression for the observed returns process and to 
deduce its time-series properties we introduce two related random variables: 


= 1 (no trade) with probability л, uns 
Ê = 0 (trade) with probability 1 — л, (3.1.2) 
X,, (17) = (1—64)84 18, 2 6, К> 0 
1 with probability -A 
ES е (3.1.3) 


0 with probability ! -- in- 


where X,(0) = 1 — êi, {би} is assumed to be independent of {б} fori * j 
and temporally HD for each i= 1,2,..., N. 

The indicator variable 8, takes on the value onc when security i docs 
not trade in period t and is zero otherwisc. Хи CY is also an indicator variable 
and takes on the value one when security i trades in period f but has not 
traded in any of the k previous consecutive periods, and is zero otherwise. 
Since , is within the unit interval, for large k the variable X( will be zero 
with high probability. This is not surprising since it is highly unlikely that 
security į should trade today but never in the past. 

Having defined the X,, ('s it is now a simple matter to derive an explicit 
expression for observed returns 7 


oo 
1 Nö i= 1 . V. (3.1.4) 

k= 
If sccurity i does not trade in period ¢, then 6,,=1 which implies that X,,(4)=0 
for all k, and thus 7,—0. If i does trade in period f, then its observed return 
is equal to the sum of today's virtual return n, and its past А, virtual returns, 
where the random variable ky is the number of past consecutive periods that 
thas not traded. We call this the diiation of nontrading, which may be 


expressed as 
№ k 
4% II. (3.1.5) 
=| MESI 


Although (3.1.4) will prove to be more convenient for subsequent calcula- 
tions, k may be used to give a somewhat more intuitive definition of the 


These Mrong assumptions are made primarily loi espeositional eomvenmence and may be 
relaxed considerably, Sce Section 3.1.2 for further dix usso, 
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observed returns process: 


1 * 3 k i= 1,....N. (3.1.6) 


Whereas (3.1.4) shows that in the presence of nontrading the observed 
returns process is a (stochastic) function of all past returns, the equivalent 
relation (3.1.6). reveals that 75 may also be viewed as a random sum with a 
random number of terms.” 

A third and perhaps most natural way to view observed returns is the 
following: 


0 with probability л, 
D» with probability (1-102 
„„ Net with probability (I-, )x, 
it 1, rusa сә with probability =r, r? (3.1.7) 


„ oco rua with probability (I-. r 


Expressed in this way, it is apparent that nontrading can induce spurious 
serial correlation in observed. returns because each rf, contains within it 
the sum of past k consecutive virtual returns for every k with some positive 
probability (l = 12A. 

To see how the nontrading probability zr; is related to the duration of 
nontading, consider the mean and variance of ky: 


л, 


11 = ee Varlk,] = (3.1.8) 


— N, d-r)? 


Itm, 


then security i goes without trading for one period at a time on aver- 


D 


IP 3 . . . . : 
age; if , i then the average number of consecutive periods of nontrading 


ША КЕШКИ spirit to the Scholes and Williams (1977) subordinated stochastic process 
representation of observed ceturns, although we do not restrict the trading times to take values 
in a fixed time interval, With suitable nonnalizations it may be shown that our nontrading 
model converges weakly to the continuous-time Poisson process of Scholes and Williams (1977). 
From CLA) the observed returns process may also be considered an infinite-order moving 
average of virtual requis where the MA cocllicients are stochastic, This is in contrast to Cohen, 
Maict Schwartz and Whitcomb (1986, Chapter 6) in which observed retains are assumed to be 


a linitescidei MA process with nomtechastic coetficients Although our nontrading process is 


more genera, dieit observed returns process includes а bid-ask spread component; ours docs 
not, 
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is three. As expected, if the security trades every period so that л; O, both 
the mean and variance of k, are zero. 


Implications for Individual Security Returns 

To see how nontrading can affect the time-series properties of the observed 
` returns of individual securities, consider the moments of т? which, in turn, 
depend on the moments of X, (H.“ For the nontrading process (3.1.2)- 
(3.1.3), the observed returns processes {r5} (i = I.. ... N) are covariance- 
stationary with the following first and second moments: 


Er] = u. (3.1.9) 
„ m o. 
Var] = ef =p} (3.1.10) 
ріл! for i= j. n 0 
Сом[ ти. Tal = ERE (3.1.11) 
I PTT (1-л,)(1-т,) LE * Bi Bog m L^ for ix j. " >0 
—pin” 
Согг[ т, Tu 75,44] o] + E. pi , n> 0, (3.1.12) 


where o? = Var[ri] and oj = Var{ fi). 

From (3.1.9) and (3. I. 10) it is clear that nontrading does not affect the 
mean of observed returns but does increase their variance ifthe security hasa 
nonzero expected return. Moreover, (3.1.12) shows that having a nonzero 
expected return induces negative serial correlation in individual security 
returns at all leads and lags which decays geometrically. The intuition for 
this phenomenon follows from the fact that during nontrading periods the 
observed return is zero and during trading periods the observed return 
reverts back to its cumulated mean return, and this mean reversion yields 
negative serial correlation. When j.;=0, there is no mean reversion hence 
no negative serial correlation in this case. 


Maximal Spurious Autocorrelation { 


These moments also allow us to calculate the maximal negative ащосогге- 
lation attributable to nontrading in individual security returns. Since) the 
autocorrelation of observed returns (3.1.12) is a nonpositive continuous 
function of л; that is zero at = and approaches zero as л; approaches 
unity, it must attain a minimum for some л; in [0,1). Determining this lower 
bound is a straightforward exercise in calculus, and hence we calculate it 


only for the first-order autocorrelation and leave the higher-order cases to 
the reader. i 


To conserve space, we summarize the results here and refer readers to Lo and MacKinlay 
(19908, 19900) for further details. 
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Under (3.1.2)-(3.1.3) the minimum firscorder autocorrelation of the 


observed returns process {тд} with respect to nontrading probabilities л, is 
given by 


lkl ) 
Min Corr(7?, r? = — (cr | 3.1.13 
(л,] Uri Tii] 1 + /2 E] 
where E, жи {о and the minimum is attained at 
| 8 
л; = (3.1.14) 


1＋ МЕ 
Over all values of xr; € (0, I) and E, € (—оо, +оо), we have 


1 
Uit Corn. Tay) = uri (3.1.15) 


which is the limit of (3.1.13) as |£,] increases without bound, but is never 
attained by finite £;. 

Although the lower bound of -i scenis quite significant, it is virtually 
unattainable for any empirically plausible parameter values. For example, 
if ме! consider a period to be onc trading day, typical values for д; and 
а; ar .05% and 2.5%, respectively, implying a typical value of 0.02 for E,. 
Accofding to (3.1.13), this would induce a spurious autocorrelation of at 
most +-0.037% in individual security returns and would require a nontrading 
probability of 97.2% to attain, which corresponds to an average nontrading 
duration of 35.4 days! 

hese results also imply that nontrading- induced autocorrelation is 
magn|fied by taking longer sampling intervals since under the hypothe- 
sized yirtual returns process, doubling the holding period doubles и; but 
only multiplies о; by a factor of V. Therefore more extreme negative au- 
tocorrelations are feasible for longer-horizon individual returns. However, 
this isi not of direct empirical relevance since the effects of time aggrega- 
tion have been ignored. To sec how, observe that the nontrading process 
(3.1.2)-(3.1.3) is not independent of the sampling interval but changes in 
a nonlinear fashion. For example, if a period is taken to be one week, 
the possibility of daily nontrading and all its concomitant effects on weekly 
observed returns is eliminated by assumption. A proper comparison of ob- 
served returns across distinct sampling intervals must allow for nontrading at 
the finest time increment, after which the implications for coarser-sampled 
returns may be developed. We shall postpone further discussion of this and 
other issues of time aggregation until later in this section. 


Asymmetric Cross-Autocovariances 
Several other important empirical implications of this nontrading model 
are captured by (3.1.11). In particular, the sign of the cross-autocovariances 
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is determined by the sign of £,f,. Also, the expression is not symmetric 
with respect to i and j: If xi = 0 and 7, # 0, then there is spurious cross- 
autocovariance between rj aud rj, , but no crosssuitocovariance between 


т and rj, for any n > 0.11 "The intuition for this result is simple: When 
security j exhibits nontrading, the returns to a constantly trading security i 
can forecast j due to the common factor f; present in both returns. That 
j exhibits nontrading implies that future observed returns ry, will be a 
weighted average of all past virtual returns мен (with the X) (1%) 5 as ran- 
dom weights), of which one term will be the current virtual return rj. Since 
the сошетпрогапеоцу virtual returns ry and ту ave correlated (because of 
the common factor), rf can forecast ja y However. v, is itself unforecastable 
because ту = ra for all t (since л, = 0) and ry is HD by assumption, thus "n 
is uncorrelated with rf}, for any n > 0. 

The asymmetry of (3.1.11) yields an empirically testable restriction on 
the cross-autocovariauces of returns. Since the only source of asymmetry 
in (3.1.11) is cross-sectional differences in the probabilities of nontrading, 
information regarding these probabilities may be extracted from sample 
moments. Specifically, denote by rf the vector [ rf, ry, ‘°° ry, V of observed 
returns of the N securities and detine the autocovariance matrix С, as 


D, = Е — A) — p) ]. pn = Efe}. (3.1.16) 
Denoting the йй clement of P, by y, (Q0, we have by definition 


lestie 
TI 5 (3.1.17) 
| — m, d 


If the nontrading probabilities л, differ across securities, T. is asymmetric, 
From (3.1.17) itis evident that 


yj _ (2) (3.1.18) 
y(n) л; 


Therefore relative nontrading probabilities may be estimated directly using 


sample autocovariances Ta. То derive estimates of the probabilities л; them- 
selves we need only estimate one such probability, say л, and the remaining 
probabilities may be obtained from the ratios (3.1.18). A consistent estima- 
tor of is readily constructed with sample means and autocovariances via 


(3.1.11). 


4 An alternative interpretation of this asymmetry may be found in the time-series literate 
concerning Grangercausality (see Granger [1969]), in which , 1s said to Granger-cause 751 if 
the return to i predicts the return to j. In the above example, security i Granger-causes security 
j when j is subject to nontrading but i is not. Since our nontrading process may be viewed as 
a form of measurement error, the fact that the returns to one secuiity may be exogenous with 


respect to the returns of another has been proposed under a diflerent guise in Sims (1974, 
1977). 


Implications for Portfolio Returns 

Suppose securities are grouped by their nontrading probabilities and equal- 
weighted portfolios are formed based on this grouping so that portfolio A 
contains N, securities with identical nontrading probability x, and similarly 
for portfolio В, Denote by rf, and rf, the observed time-t returns on these two 
portfolios respectively, which are approximately averages of the individu: 
returns: 


. ta „ = ‘ 9) 
E. у, ne K a, b, (3.1.18 
N, 


* acl 


where the summation is over all securities i in the set of indices /, which 
comprise portfolio x. The reason (3.1.19) is not exact is that both observed 
and virtual returns are assumed to be continuously compounded, and the 
logarithm of a sum is not the sum of the logarithms. However, if 7? takes 
on small values and is not too volatile plausible assumptions for the short 
return intervals that nonsynchronous trading models typically focus on- - 
the approximation error in (3.1.19) is negligible. 

The time-series properties of (3.1.19) may be derived from a simple 
asymptotic approximation that exploits the cross-sectional independence 
of the disturbances éy. Similar asymptotic arguments can be found in the 
Arbitrage Pricing Theory (APT) literature (see Chapter 6); hence our as- 
sumption of independence may be relaxed to the same extent that it may 
be relaxed in studies of the APT in which portfolios are required to be 
"welldiversified. In such cases, as the number of securities in portfolios 
A and В (denoted by N, and №, respectively) increases without bound, the 

*ollowing equalities obtain almost surely: 


со 
„„ our] u.) H. Улар. (10) 
k=0 
where 
1 1 
п, = ocv Has В. ТЕ В, umo) 


PA precise mteipietation ol f, is the return to à pordolio whose value is calculated as 
an unweilited geometic average of the component securities’ prices, The expected renun 
ob such a portfolio will be lower than that of an equal-weighted portfolio whose returns aie 
caleulated as the arithmetic means of the simple returns of the component securities. This 
issue is examined in greater detail by Modest and Sundaresan (1983) and Eytan and Farpas 
(1086) in the contest ob the Value Line Index which was an unweighted geometric average 
ипи! 198m. 

V see, dot example, Chamberlin (19830), Chambertain and Rothschild (1983), and Wang 
(1003). The essence ol these weaker conditions is simply to allow à Law of Large Numbers to 
be applied to the average of the distin bances, so that “idiosyncratic risk” vanishes almost surely 
as the cross section grows. 


for к = a, b. The first and second moments of the portfolios" returns are 
then given by 


EL] = pe = Еа] (3.1.22) 

Varl в (28) (3.1.23) 
Covlfen fu = 8 (To) nro}, п> 0 (3.1.94) 
Сот л, п> 0 (3.1.25) 
Cov[r, tual = 2 b. faf of x}, 126) 


| “= we 


where the symbo indicates that the equality obtains only asymptotically. 

From (3.1.22) we see that observed portfolio returns have the same 
mean as the corresponding virtual returns. In contrast to observed individ- 
ual returns, the variance of r7, is lower asymptotically than the nance of 
its virtual counterpart far since 


"ES x^ - Hat Baht . Ve. (3.1.27) 
а sel, М, iel, | 
= pat Pafo (3.1.28) 


where (3.1.28) follows from the law of large numbers applied to he last 
term in (3.1.27). Thus Var[ra] 5 Во], which is greater than or equal to 
Vàr[r?]. 

Since the nontrading-induced autocorrelation (3.1.25) declines geo- 
metrically, observed portfolio returns follow a first-order autoregressive pro- 
cess with autoregressive coefficient equal to the nontrading probability. In 
contrast to expression (3.1.11) for individual securities, the autocorrelations 
of observed portfolio returns do not depend explicitly on the expected re- 
turn of the portfolio, yielding a much simpler estimator for л,: the nth 
root of the nth order autocorrelation coefficient. Therefore, we may easily 
estimate all nontrading probabilities by using only the sample first-order 
own-autocorrelation coefficients for the portfolio returns. 

Comparing (3.1.26) to (3.1.11) shows that the cross-autocovariance be- 
tween observed portfolio returns takes the same form as that of observed 
individual returns. If there are differences across portfolios in the nontrad- 
ing probabilities, the autocovariance matrix for observed portfolio returns 
will be asymmetric. This may give rise to the types of lead-lag relations 
empirically documented by Lo and MacKinlay (1988) in size-sorted portfo- 


94 3. Market Microstructure 


lios. Ratios of the eross-autocovariances may be formed to estimate relative 
nontrading probabilities for portfolios, since 


Covir raul a (Tu. 
oviran ны] . (2) | (3.1.29) 


Cov[ y DM n | Ta 


In addition, for purposes of testing the overall specification of the non- 
trading model, these ratios give rise to many over-identifying restrictions, 
singe 

| 


| Yax (n) Ye OY Yea л) wee Vies ix, OU yx, n) E (2) 


(3.1.30) 
Priel n) Ya (N) Уак, (n) °°° Ук, (n) Yos (Q0) 


Ha 


nis arbitrary sequence of distinct indices Kı, ks, . .. X,, d x U, r < №, 


whdre № is the number of distinct portfolios and yy, " (= = Cov[7 n m ale 


Therefore, although there are N; distinct autocovariances in F., ilie restric- 
tious implied by the nontrading process yield far fewer degrees of freedom. 


Time Aggregation 
Thejdiscrete-time framework we have adopted so far does not require the 
specification of the calendar length of a "period." "This advantage is more 
apparent than real since any empirical implementation of the nontrading 
model (3.1.2)-(3.1.3) must eicher implicitly or explicitly define a period to 
be a particular fixed calendar time interval. Once the calendar time interval 
has been chosen, the stochastic behavior of coarser-sampled data is restricted 
by the parameters of the most finely sampled process. For example, if the 
length of a period is taken to be one day, then the moments of observed 
monthly returns may be expressed as functions of the parameters of the 
daily observed returns process. We derive such restrictions in this section. 
To do this, denote by ri (4) the observed return of security i at time т 
where one unit of t-time is equivalent to q units of t-time, thus: 


14 


moe » * (3.1.31) 


б=(с—1)+1 


Then under the nontrading process (3.1.2)—(3.1.3), it can be shown that 
the time-aggregated observed returns processes (15(4)] (2 = 1,..., N) are 
covariance-stationary with the following first and second moments (see Lo 


and MacKinlay [1990a]): 


ELT. (Q) = qui (3.1.32) 


21,601 — n?) , 


ari? = да? + ui 3.1.33 
arl 7 (%)] до, (1 <. my Bj ( ) 
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: 1M. 
Сок (J. %)! = pert" mo (I) n»0 (3.1.34) 


l-r, 


| Р : (4 Е na ud qt ll 
Corr (p. „ „%)! = d d OFF OR аа PETS n > 0 (3.1.35) 


А (1 = л,)(1 — ,) 2 (uD tl 
Сом аф) = 5, B, , r d 
e 1 — 7,7, Ке 


AN | | : 
x T) б iA, п> O0. (3.1.30) 


where £, = , / oi as before 

Although expected returns timc-aggregate linearly, (3.1.33) shows that 
variances do not As a result of the negative serial correlation in rf, the 
variance of a sum is less than the sum of the variances, Time aggregation 
does not affect the sign of the autocorrelations in (3.1.35) although their 
magnitudes do declinc with the aggregation value q. Asin (3.1.12), the auto- 
correlation of timc-aggregated returns is a nonpositive continuous function 
or, on [0, D) which is zero atm, = 0 and approaches zero as r; approaches 
unity, and hence it attains a minimum. 

To explore the behavior of the first-order autocorrelation, we plot it as 
a function of 7, in Figure 3.1 for a variety of values of d and £: q takes on the 
values 5, 22, 66, and 244 to correspond to weekly, monthly, quarterly, and 
annual returns, respectively, since q = is taken to be one day, and € takes 
on the values 0.09, 0.16, and 0.21 to correspond to daily, weekly, and monthly 
returns, respectively. Figure 3. la plots the first onder autocorrelation 910 
for the four values of q with € = 0.09. The curve marked "q = 5" shows that 
the weekly first-order autocorrelation induced by nontrading never exceeds 
5% and only attains that value with a daily nontrading probability in excess 
of 90%. 

Although the autocorrelation of. coarsersampled returns such as 
monthly or quarterly have more extreme minima, they are attained only 
at higher nontrading probabilities. Also, time-aggregation need not always 
yield a more negative autocorrelation, as is apparent from the portion of 
the graphs to the left of, say, л = .80; in that region, an increase in the 
aggregation value y leads to an autocorrelation closer to zero. Indeed as 9 
increases without bound the autocorrelation (3.1.35) approaches zero for 
fixed r, aud thus nontrading has little impact on longer-horizon returns. 


“Values tor E were obtained by taking the ratio of the sample mean to the sample standard 
deviation for daily, weekly, and monthly equalweighted stock retiras indexes for the sample 
period from 1962 10 1987 as reported in Lo and Mackinlay (T988, Tables Га). Although 
these values may be more representative of stock indexes cathe: than imdividual securities, 
nevertheless for the sake of illustration they should suffice. 
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The effects of increasing Ё are traced out in Figures 3. Ib and 3.1c. Even 
if we assume £ = 0.21 for daily data, a most extreme value, the nontrading- 
induced autocorrelation in weekly returns is at most —8% and requires а 
daily nontrading probability of over 90%. From (3.1.8) we see that when 
л; = .90 the average duration of nontrading is nine days! Since no security 
listed on the New York or American Stock Exchanges is inactive for two 
weeks (unless it has been delisted), we infer from Figure 3.1 that the impact 
of nontrading for individual short-horizon stock returns is negligible. 


Time Aggregation For Portfolios 

Similar time-aggregated analytical results can be derived for observed port- 
folio returns. Denote by 7°, (4) the observed return of portfolio A at time r 
where one unit of r-time is equivalent to q units of t-time; thus 


Tg 


(0 = J f (3.1.37) 


t=(t-1)q+1 


where 7j, is given by (3.1.19). Then under (3.1.2)-(3.1.3) the observed 
portfolio returns processes (r2,(4)) and (2, (q)) are covariance-stationary 


with the following first and second moments as N, and N, increase without 
bound: i 


E[ (D) = qu. (3.1.38) 
— 3252 ! 
Var[r, r (%)] = (1-2 وا‎ of (3.1.39) 
- атре | 
Соми Q) тд (ФІ E E ==] | 
xm Bo: n 0 (3.1.40) 


(1 nj) лу п9—4+1 


ШЕ 


Согг[т (J), 72, „(91 п> 0 (3.141) 
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qu —л?у—2л„(1—л/)' 


Сом[ та, (4), Tr +, 4)] 


Tu - n1) ennt] fl. ., 
a [4 ETRE ims for n= 0 
(8.1.49) 
ü -л (ль) 1-5? nq-q-l 
pu [= ч] л, T Ba Bo oj fo n> 0 


for x = a, b, q > 1, and arbitrary portfolios a, b, and time r. 


* r эзет эр 


- 


gy rm ж 


98 3. Market Microstructure 


Equation (3.1.40) shows that time aggregation also affects the autocor- 
relation of observed portfolio returns in a highly nonlinear fashion. In 
contrast to the autocorrelation for time-aggregated individual securities, 
(3.1.40) approaches unity for any fixed q as л, approaches unity; therefore 
the maximal autocorrelation is onc. 

To investigate the behavior of the portfolio autocorrelation we plot it 
as a function of the portfolio nontrading probability л in Figure 3.14 for 
q = 5, 22, 66, and 244. Besides differing in sign, portfolio and individ- 
ual autocorrelations also differ in absolute magnitude, the former being 
much larger than the latter for a given nontrading probability. 1f the non- 
trading phenomenon is extant, it will be most evident in portfolio returns. 
Also, portfolio autocorrelations are monotonically decreasing in q so that 
lime aggregation always decreases nontrading-induced serial dependence 
in portfolio returns. This implies chat we are most likely to find evidence of 


nontrading in short horizon returns. We exploit both these implications in 
the empirical analysis of Section 3.4.1. 


3.1.2 Extensions and Generalizations 


Despite the simplicity of the model of nonsynchronous trading in Section 
3.1.1, its implications for observed time series are surprisingly rich, The 
framework can be extended and generalized in many directions with tiute 
difficulty, 

It is a simple matter to relax the assumption that individual virtual re- 
turnsjare HD by allowing the common factor to be autocorrelated and dic 
disturbances to be cross-sectionally correlated. For example, allowing /, 
to beja stationary AR(I) is conceptually straightforward, although the cal- 
culations become somewhat more involved. This specification will yield a 
decomposition of observed auiocorrelations into two components: onc due 
to thé common factor and another due to nontrading. 

Allowing cross-sectional dependence in the disturbances also compli- 
cates the moment calculations but does not create any intractabilitics." 
Indedd, generalizations to multiple factors, time-series dependence of the 
disturbances, and correlation between factors and disturbances are only lim- 
ited by the patience and perseverance of the reader; the necessary moment 
calculations are not intractable, but merely tedious. 

Dependence can be built into the nontrading process itself by assuming 
that the %s are Markov chains, so that the conditional probability of trading 


| 


TAS we discussed earlier, some torm ob Cross-sectional weak dependence must be imposed 
so that tthe assmptotic arguments of the portfolio results still obtain. Of course, such an 
asump]Jion may not always be appropriate as. for example, in the case of companies within the 
same industry, whose residual risks we might expect to be positively correlated. Therefore, the 
хмарка approximation will be most accurate tor wellativersitied portfolios, 
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tomorrow depends on whether or nota wade occurs today. Although this 
specification does admit compact and elegant expressions for the moments 
of the observed returns process, we shall leave their derivation to the reader 
(see Problem 3.3). However, a brief summary of the implications for the 
timc-series properties of observed returns may be worthwhile: (1) Individ- 
ual security returns may be positively autocorrelated and portfolio returns 
may be negatively autocorrelated, but these possibilities are unlikely given 
empirically relevant parameter values; (2) Ht is possible, but unlikely, for 
autocorrelation matrices to be symmetric; and (3) Spurious index autocor- 
relation induced by nontrading is higher (or lower) when there is positive 
(or negative) persistence in nontrading. In principle, property (3) might 
be sufficient to explain the magnitude of index autocorrelations in recent 
stock market data. However, several calibration experiments indicate the 
degree of persistence in nontrading required to yield weekly autocorrcla- 
tions of 30% is empirically implausible (see Lo and MacKinlay [1990c] for 
details). 

One final direction for further investigation is Ure possibility of depen- 
dence between the nontrading and virtual returns processes. If virtual re- 
turns are taken to be new information then the extent to which traders 
exploit this information in determining when (and what) to trade will show 
itself as correlation between ry and &. Many strategie considerations аге 
involved in models of information-based wading, and an empirical analysis 
of such issues promises to be as challenging as it is exciting." 

However, if it is indeed the case that return autocorrelation is induced 
by information-based noptrading, in what sense is this autocorrelation spu- 
rious? The premise of the extensive literature on nousyuchironous trading 
is that nontrading is an outcome of institutional features such as lagged ad- 
jusunents and nonsynchronously reported prices. But if nonsynchronicity 
is purposeful and informationally motivated, then the serial dependence it 
induces in asset returns should be considered genuine, since it is the result 
of economic forces rather than measurement error, 1n such cases, purely 
statistical models of nontrading are clearly inappropriate and an economic 
model of strategic interactions is needed. 


3.2 The Bid-Ask Spread 


One of the inost important characteristics that investors look for in an or- 
ganized financial market is liquidity, the ability to buy or sell significant 


Some good illustrations af the kind of олар behavior that can abe hom stategie 
considerations are contained in Admati and Pfleiderer (1988, 1989), err iim and fo (19906), 
Easley and O Tha (1987, 1990), Kyle (1985), and Wang (1993, 199 f). 


^ 


quantities of a security quickly, anonymously, and with relatively little price 
impact. То maintain liquidity, many organized exchanges use markeunak- 
ers, individuals who stand ready to buy or sell whenever the public wishes 
to sell or buy. In return for providing liquidity, marketmakers are granted 
monopoly rights by the exchange to post different prices for purchases and, 
sales: They buy at the did price P, and sell at a higher ask price I. This 
ability to buy low and sell high is the marketmaker's primary source of com- 
pensation for providing liquidity, and although the bid-ask spread Р, — P, is 
rarely larger than one or two ticks—the NYSE Fact Book: 1994 Data reports 
that the spread was $0.25 or less in 90.8% of the NYSE bid-ask quotes from 
1O0-4—over a large number of trades marketmakers can carn enough to 
compensate them for their services. 

The diminutive size of typical spreads also belies their potential im- 
portance in determining the time-series properties of asset returns. For 
example, Phillips and Smith (1980) show that most of the abnormal re- 
turns associated with particular options trading strategies are eliminated 
when the costs associated with the bid-ask spread are included. Blume 
and Stambaugh (1983) argue that the bid-ask spread creates a significant 
upward bias in mean returns calculated with transaction prices. More re- 
cently, Keim (1989) shows that a significant portion of the so-called January 
effeci-the fact that smaller-capitalization stocks seem to outperform larger 
capitalization stocks over the few days surrounding the turn of the year— 
may be attributable to closing prices recorded at the bid price at the end 
of December and closing prices recorded at the ask price at the begin- 
ning of January. Even if the bid-ask spread remains unchanged during this 
period, the movement from bid to ask is enough to yield large portfolio 
returns, especially for lower-priced stocks for which the percentage bid-ask 
spread is larger, Since low-priced stocks also tend to be low-capitalization 
stocks, Keim's (1989) results do offer a partial explanation for the January 
effect,” 

The presence of the bid-ask spread complicates matters in several ways. 
Instead of one price for each security, there are now three: the bid price, 
the ask price, and the transaction price which need not be either the bid 
or the ask (although in some cases it is), nor need it lie in between the two 
(although in most cases it does). How should returns be calculated, from 
hid-to-bid, ask-to-bid, ete? Moreover, as random buys and sells arrive at 
the market, prices can bounce back and forth between the ask and the bid 
prices, ercating spurious volatility and serial correlation in returns, even if 
the economic value of the security is unchanged. 


ШИТ (Yon) also documents the rebaion beween other calendar anomalies (he weekend 
effect, holiday effects. ete) and systematic movements between the bid and ask prices, 
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To account for the impact of the bid-ask spread on the time-series properties 
of asset returns, Roll (1984) proposes the following simple model. Denote 
by PF the time-t fundamental value of a security in a frictionless economy, 
aud denote by s the bid-ask spread (see Glosten and Milgrom [1985], for 
example). Then the observed market price P, may be written as : 


Pp o = P+ . ^ 820 


+1 with probability 2 (buyer-initiated) 


V n (3.2.3) 
\ 


—l with probability i (seller-initiated) 


where J, is an order-type indicator variable, indicating whether the transac- 
tion at time / is at the ask (buyer-initiated) or at the bid (sellerinitiated) 
price. The assumption that P? is the fundamental value of the security 
implies that ELI] = 0, hence Pr(4=1) = Pr(4= — 1) = $. Assume for 
the moment that there are no changes in the fundamentals of the security; 


hence P? = P' is fixed through time. Then the process for price changes 


AP, is given by 


5 s Б 
АР, = NU -g = (. his. (3.2.3) i 


and under the assumption that J, is IID the variance, covariance, and auto- . 


correlation of AP, may be readily computed 


E x 
> А Varl AP.! = 2 (3.2.4) 
52 
Со АЖА, АР] = -F d (3.9.5) 
Covl АР, Ар] = 0. А> 1 (3.9.6) 
Corr[ AP]. APR} = -5 А (9.2.7) 


Despite the fact that fundamental value P7 is fixed, AP, exhibits volatility 
and negative serial correlation as the result of bid-ask bounce. The intuition 
is clear: If P* is fixed so that prices take on only two values, the bid and 
the ask, and if the current price is the ask, then the price change between 
the current price and the previous price must be either 0 or sand the price 
change between the next price and the current price must be either 0 or s. 
The same argument applies if the current price is the bid, hence the serial 
correlation between adjacent price changes is nonpositive. This intuition 
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applies more generally to cases where the order-type indicator /, is not HD, 
hence the model is considerably more general than it may seem. 

The larger the spread s, the higher the volatility and the first-order 
autocovariance, both increasing proportionally so that the first-order auto- 
correlation remains constant at -i Observe from (3.2.6) that the bid-ask 
spread does not induce any higher-order serial correlation. 

Now let the fundamental value V change through time, but suppose 
that its increments are serially uncorrelated and independent of I.“ Then 
(3.2.5) still applies, but the first-order autocorrelation (3.2.7) is no longer 
-i because of the additional variance of AP? in the denominator. Specifi- 
cally if o2(AP*) is the variance of APF, then 

4/4 


Corrl АР, AR) = — e < 
EEE GUT) о (АР) ^ 


0. (3.2.8) 


Although (3.2.5) shows that a given spread s implies a first-order autoco- 
varihnce of ~s*/4, the logic may be reversed so that a given autocovariance 
90 i and value of p imply a particular value for s. Solving for s in 
(3.915) yields 


| s = 2/— Cov[ AP, i, АР], (3.2.9) 


hence s may be casily estimated from the sample autocovariances of price 
changes (see the discussion in Section 3.4.2 regarding the empirical imple- 
mentation of (3.2.9) for further details). 

Estimating the bid-ask spread may seem superfluous given the fact that 
bid-ask quotes are observable. However, Roll (1984) argues that the quoted 
spread may often differ from the effective spread, i.c., the spread between 
the actual market prices of a sell order and a buy order. In many instances, 
transactions occur at prices within the bid-ask spread, perhaps because mar- 
ketmakers do not always update their quotes in a timely fashion, or because 
they wish to rebalance their own inventory and are willing to "better" their 
quotes momentarily to achieve this goal, or because they are willing to pro- 
vide discounts to customers that are trading for reasons other than private 
information (scc Eikeboom [1993], Glosten and Milgrom [1985], Goldstein 
11993), and the discussion in the next section for further details). Roll's 
(1984) model is one measure of this effective spread, and is also a means for 


For example, serial correlation in J, (of either sign) does not change the [act that bid- 
ask bounce induces negative serial correlation in price changes, although it does allect the 
magnitude. See Choi, Salandro, and Shastri (1988) for an explicit analysis of this case. 

Roll (1984) argues that price changes must be serially uncorrelated in an informationally 
efficient market, However, Leroy (1973), Lucas (1978), and others have shown that this need 
not be the case. Nevertheless, for short-horizon returns, e.g. daily or intradaily returns, it 


is difficult to pose an empirically plausible equilibrium model of asset returns tliat exhibits 
significant serial correlation. 
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accounting for the effects of the bid-ask spread on the time-series properties 
of asset returns, 


3.2.2 Camponents of the Bid-Ask Spread 


Although Roll's model of the bid-ask spread captures onc important aspect 
of its effect on transaction prices, it is by no means a complete theory of 
the economic determinants and the dynamics of the spread. In particular, 
Roll (1984) takes s as given, but iu practice the size of the spread is the 
single most important quantity that marketmakers control in their strategic 
interactions with other market participants. In fact, Glosten and Milgrom 
(1985) argue convincingly that sis determined endogenously and is unlikely 
to be independent of P* as we have assumed in Section 3.2.1. 

Other theories of the marketmaking process have decomposed the 
spread into more fundamental components, and these components often 
behave in different ways through time and across securities. Estimating the 
separate components of the bid-ask spread is critical for properly imple- 
menting these theories with transactions data. In this section we shall turn 
to some of the cconomeuic issues surrounding this task. 

There are three primary economic sources for the bid-ask spread: order 
processing costs, inventory costs, and adverse-sclection costs. Ihe first two 
consist of the basic setup and operating costs of trading and recordkeeping, 
and the carrying of undesired inventory subject to risk. Although these costs 
have been the main focus of carlier literature,?“ it is the adverse-selection 
component that has received much recent attention?! Adverse selection 
costs arise because some investors are better informed about a sccurity's 
value than the markeunaker, and trading with such investors will, on av- 
erage, be a losing proposition for the markeunaker. Since marketmakers 
have no way to distinguish the informed from the uninformed, they are 
forced to engage in these losing trades and must be rewarded accordingly. 
Therefore, a portion of the marketmaker's bid-ask spread may be viewed 
as compensation for taking the other side of potential information-bascd 
trades. Because this information component can have very different statis- 
tical properties from the order processing and inventory components, it is 
critical to distinguish between them in empirical applications. To do so, 
Glosten (1987) provides a simple asymmcuic-information model that cap- 
tures the salient features of adverse selection for the components of the 
bid-ask spread, and we shall present an abbreviajed version of his elegant 
analysis here (sce, also, Glosten and Harris [1988] and Stoll [1989]). 


See, for example, Amihud and Mendelson (1980), Rage (1971), Demsetz (1968), Ho 
and Stoll (1981), Stoll (1978), and Tinic (1972). 

бее Bagehot (1971), Copeland and Сабаз (1983), Easley and O'Hara (1987), Glosten 
(1987), Glosten and Harris (1988), (osten and Milgrom (1985), and Stoll (1989), 
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Glosten's Decomposition 

Denote by P, and P, the bid and ask prices, respectively, and let P be the 
“име? or common-information market price, the price that all investors with- 
out private information (uninformed investors) agree upon. Under risk- 
neutrality, the common-information price is given by P = E(P*(Q] where 
Q denotes the common or public information set and P* denotes the price 
that would result if everyone had access to all information. The bid and ask 
prices may then be expressed as the following sums: 


PR = P-A,-G, (3.2.10) 
P, = Pa A, + Cy (3.2.1 1) 
„ = Р Гу = (АЖА) + (Ca . (5.7.12) 


where Ae, is the adverseselection component of the spread, to be de- 
termined below, and G,4- 6, includes the order-processing and inventory 
components which Glosten calls the (moss profit component and takes as 
exogenous." I uninformed investors observe a purchase at the ask, then 
they will revise their valuation of the asset from P to PFA, to account for 
the possibility that the (rade was information-motivated, and similarly, if a 
sale at the bid is observed, then P will be revised to РА. But how are A, 
and A, determined? 

Glosten assumes that all potential marketmakers have access to com- 
mon information only, and he defines their updating rule in response to 
transactions at various possible bid and ask prices as 


а(х) = | P* | QU (investor buys at х) | (5.2.13) 
hy) = zi P* | QU investor sells at y] | (3.2.14) 

A, and Ap are then given by the following relations: 
A, = аР) = Р, A, = Р-Р). (3.2.15) 


Under suitable restrictions for at) and DO), an equilibrium among compet- 
ing marketmnakers will determine bid and ask prices so that the expected 
profits from mu kennaking activities will cover all costs, including ( 
and Agt An hence 


J = MPD G = PF (MP) - P) C, = PHA ＋ (3.2.16) 


P, WP) CO, = Р (Р hu) — ( = P- Ar~ G. (3.2.17) 


1 


See Amihud and Mendelson (01930); Cohen, Mater, Schwartz, and Whiicomb (01981); Ho 
amd Stoll (E081); and Stoll (1978) lor models of these costs. 
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An immediate implication of (3.2.16) and (3.2.17) is that only a portion of 
the total spread, Ca C, covers the basic costs of marketmaking, so that 
the quoted spread Ag+A,+C,+C, can be larger than Stoll's (1985) "ef- 
fective" spread—the spread between purchase and sale prices that occur 
strictly within the quoted bid-ask spread—the difference being the adverse- 
selection component A,+A,. This accords well with the common practice 
of marketmakers giving certain customers a better price than the quoted 
bid or ask on certain occasions, presumably because these customers are 
perceived to be trading for reasons other than private information, e.g., 
liquidity needs, index-portfolio rebalancing, etc. 


Implications for Transaction Prices jo 
To derive the impact of these two components on transaction prices, denote 
by Р, the price at which the nth transaction is consummated, and let | 

! 

i 


P, = Pala + Poh, (3.2.18) 


where J, (h) is an indicator function that takes on the value one if the trans- 
action occurs at the ask (bid) and zero otherwise. Substituting (3.2.16)- 
(3.2.17) into (3.2.18) then yields 


^ 


P, = E(P'QUAM,- E(P QU BM, + Cala — Cols (35.19) 


= u Cu Q, (3.2.20) 

P, = E[P'IQUAM,-- E[P'IQU В] (3.2.21) 
C, ifbuyerinitiated trade 

C, = . n (3.2.22) 
Cy, if seller-initiated trade 

"E I d eile trade (3.2.23) 
. —]l ifsellerinitiated wade 


where A is the event in which the transaction occurs at the ask and B is 
the event in which the transaction occurs at the bid. Observe that P, is the 
common information price after the nth transaction. 

Although (3.2.20) is a decomposition that is frequently used in this liter- 
ature, Glosten's model adds an important new feature: correlation between 
P, and Q,. If is the common information price before the nth transaction 
and P, is the common information price afterwards, Glosten shows that 


Aa if Q, +1 
A, if Q,—-l. 
That P, and Q, must be correlated follows from the existence of adverse 


selection. If Q,= + I, the possibility that the buyer-initiated trade is infor- 
mation-based will cause an upward revision in P, and for the same reason, 


Cov[P,, „]! = EIA]! where A = | (3.2.24) 
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Qn= — will cause a downward revision in P. There is only one case in which 
P, and Q, are uncorrelated: when the adverse-selection component of the 
sprcad is zero. 


Implications for Transaction Price Dynamics 

To derive implications for the dynamics of transactions prices, denote by e, 
the revisions in Pi duc to the arrival of new public information between 
tradys п~ 1 and n. Then the nth transaction price may be written as 


P, = 11 + €n + A,. (3.2.25) 
| . 
Takihg the first difference of (3.2.20) then yields 


^ ^ 


| Ê,- Êi = 55 ~Py1)+ (6.0, G) (3.2.20) 
| = AnQ +e, + (с.о, nus Фа). (3.9.97) 
i 


whic 
com 


1 shows that transaction price changes are comprised of a gross-profits 
ponent which, like Roll's (1984) model of the bid-ask spread, exhibits 
reversals, and an adverse-selection component that tends to be permanent. 
Therefore, Glosten's attribution of the effective spread to the gross-profits 
componcrit is not coincidental, but well-motivated by the fact that it is 
this component that induces negative serial correlation in returns, not the 
adverse-selection component. Accordingly, Glosten (1987) provides alter- 
native relations between spreads and return covariances which incorporate 
this distinction between the adverse-sclection and gross-profits components. 
In particular, under certain simplifying assumptions Glosten shows thai? 


я 2 
А Ys, 
ELR] = RO - yf), Covl 4.1, n] = — ^ : (3.2.98) 


wherc 


-N. C _ A 
Lun JE ES dE 1-G/A) ' 


| 


(Pa + Р,)/2 С+А 


and where Йу, R, аге the per-period market and truc returns, respectively, 
and 7, is the continuously compounded per-period market return. 

These relations show that the presence of adverse selection (y <1) hasan 
additional impact on means and covariances of returns that is not captured 
by other models of the bid-ask spread. Whether or not the adverse-selection 


235 pecifically, he assumes that: (1) ‘True veturus are independent of all past history: (2) 
The spread is symmetric about the true price; and (3) The gross-prolit component does not 
cause conditional drift in prices. 
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component is economically important is largely an empirical issue that has 
yetto be determined decisively,?! nevertheless Glosten's (1987) model shows 
that adverse selection can have very different implications for the statistical 
properties of transactions data than other components of the bid-ask spread. 


3.3 Modeling Transactions Data 


One of the most exciting recent developments in empirical finance is the 
availability of low-cost transactions databases: historical prices, quantities, 
bid-ask quotes and sizes, and associated market conditions, transaction by 
transaction and time-stamped to the nearest second. For example, the 
NYSE's Trades and Quotes (TAQ) database contains all equity transactions 
reported on the Consolidated Tape from 1992 to the present, which includes 
all transactions on the NYSE, AMEX, NASDAQ, and the regional exchanges. 
The Berkeley Options Database provides similar data for options transac- 
tons, and transactions databases for many other securities and markets are 
being developed as interest in market microstructure issues continues to 
grow. 

The advent of such transactions databases has given financial economists 
the means to address a variety of issues surrounding the fine structure of 
the trading process or price discovery. For example, what are determinants 
of the bid-ask spread, and is adverse sclection a more important factor than 
inventory costs in explaining marketmaking behavior?? Does the very act 
of trading move prices, and if so, how large is this price impact cect and how 
does it vary with the size of the trade??? Why do prices tend to fall more 
often or. whole-dollar multiples than on half«lollar multiples, more often 
ou half-dollar multiples than on quarter-dollar multiples, etc.??“ What are 
the benefits and costs of other aspects of a market's microstructure, such as 
margin requirements, the degree of competition faced by dealers, the fre- 
quency that orders are cleared, and intraday volatility? Although none of 


? Recent attempts to quantify the relative contiibutions of order-processing/inventory costs 
and adveise selection costs to the bid-ask spread include: Affleck Graves, Hegde, and Miller 
(1994), Glosten and Harris (1988) George, Kaul, and Nimalendran (1991), Huang and Stoll 
(19952), and Stoll (1989). See Section 3.4.2 for further discussion. 

"See Amihud and Mendelson (1980), Bagehot (1971), Copeland and Galai (1983), Dem- 
setz (1908), Easley and O'Hara (1987), Glosten (1987), Glosten and Harris (1988), Glosten 
and Milgrom (1985), Ho and Stoll (1981), Stoll (1978, 1989), and Tinic (1972). 

"See Ber binas and Lo (1996), Chan and Lakonishok (1993b, 1995), and Keim and Mad- 
havan (99a. b. 1996). 

"See Ball, Torous, and Tschoegl (1985); Christie, Harris, and Schultz (1994); Christie 
and Schultz (1994); Goodhart and Curcio (1990); Harris (1991); Niederhotler (1965, 1966); 
Niederhofler and Osborne (1966); and Osborne (1062). 

See Cohen, Maier, Schwartz, and Whitcomb (1986), Haris, Sofianos, and Shapiro (1994), 
Hasbrouck (199 a. b), Madhavan and Smidt (1991), and Stoll and Whaley (1990). 


äs- 


100% 3. Market Microstructure 


these questions are new to the recent literature, the kind of answers we can 
provide have changed dramatically, thanks to transactions data. Even the 
event study, which traditionally employs daily returns data, has been applied 
recently to transactions data to sift out the impact of news announcements 
within the day (see, for example, Barclay and Litzenberger [1988]). 

The richness of these datasets does not come without à price—trans- 
actions datasets are considerably more difficult to manipulate and analvze 
because of their sheer size. For example, in 1994 the NYSE consummated 
over 49 million transactions, and for each transaction, the NYSE's Trades 
and Quotes (TAQ) database records several pieces of information: transac- 
поп price, time of trade, volume, and various condition codes describing 
the wade. Bid-ask quotes and depths are also recorded. Even for indi- 
vidual securities, a sample size of 100,000 observations for a single vear of 
transactions data is not unusual, 


J. J. / Motivation 


Transactions data pose a number of unique econometric challenges that 
do not easily fit into the framework we have developed so far For exam- 
ple, transactions data are sampled at irregularly spaced random intervals— 
whenever trades occur—and this presents a number of problems for stan- 
dard econometric models: observations аге unlikely to be identically dis- 
лине (since some observations are very closely spaced in time while others 
may he separated hy hours or clays), it is difficult ta capture seasonal effects 
(such as time-ofs lay regularities) with simple indicator functions, and fore- 
casting is no longer a straightforward exercise because the transaction mes 
are random, 

Also, transaction prices are always quoted in discrete units or Сеа 
currently $0,125 for equities, 30.0625 for equity options, $0.05 for futures 
contracts on the Standard and Poor's 500 index, $0.03195 for US Treasury 
bonds and notes, and so on. While there are no a priori Wneoretical reasons 
to rule out continuous prices, the transactions costs associated with quot- 
ing and processing such prices make them highly impractical.” Of course, 


Despite the tndvisibilities that acc empan price discreteness, there seems to be general 
agreement among economists and practitioners alike that the efficiency gains fom discrete 
prices Ма outweigh the potential costs of indivisible trading lots. However, an unresolved 
issue is the optimal degiee of discieteness; which 1 
benefits of discreteness, For example, on the NY: 


nees the costs of tudivisibilities арам the 
E, be minimum price movement of stocks 
with prices greater than oi equal to $f is one tick, but this minimum price variation was set 
veas ago bele the advent of highspeed digital computers and corresponding electronic 
Wading mechanisms, Iis unclear whether or nor an eighth of a dollar is the optimal degree 
Of discreteness today Indeed, recent discussions between the NYSE and the US Securities 
And Exchange Commission seem to indicate a move towards decimalizatin under which prices 
gd (uates ace denominated ii cents See Ball; Toons, and Tsehoepgl (1985); Brennan and 
Copeland (1988) Dats (UH and the SECS CEOO 0) Market 2000 study loi farther discussion, 
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Table 3.1. Summary statistics for daily returns of five NYSE stocks. 


Statistic AAC APD CBS CCB KAB 
Pax 5.950 86.750 216.500 629.750 7.250 
Pan 1.375 40.625 129.000 360.250 9.875 
p 3.353 55.878 173.924 467.844 4.665 
8(Р) 0.811 11.380 18.877 53.951 0.816 
тых (%) 91.43 6.48 6.58 4.94 1613 
us () 14.29 -5.49 -7.83 —9.43 ~12.50 
T (96) 0.12 0.11 0.02 —0.00 0.00 
à (R) (96) 4.88 1.61 1.45 1.46 3.48 


Summary statistics for daily returns data from January 2, 1990, to December 31, 1992, for five 
NYSE stocks: AAC = Anacomp; APD = Air Products and Chemicals; CBS = Columbia 
Broadcasting System; CCB = Capital Cities ABC; KAB = Kaneb Services, 


discreteness is less problematic for coarser-sampled data, which may be well- 
approximated by a continuous-state process. But it becomes more relevant 
for transaction price changes, since such finely sampled price changes typ- 
ically take on only a few distinct values. For example, the NYSE Fact Book: 
1994 Data reports that in 1994, 97.4% of all transactions on the NYSE oc- 
curred with no change or a one-tick price change. Moreover, price changes 
greater than 4 ticks are extremely rare, as documented in Hausman, Lo, 
and MacKinlay (1992). 

Discreteness and Prices | 
Disereteness affects both prices and returns, but in somewhat different ways. 
With respect to prices, several studies have documented the phenomenon 
of price clustering, the tendency for prices to fall more frequently on certain 
values than on others.“ For example, Figure 3.2a displays the histograms 
of the fractional part of the daily closing prices of the following five NYSE 
stocks during the three-year period from January 2, 1990, to December.31, 
1992 (see Table 3.1 for summary statistics): Anacomp (AAC), Air Products 
and Chemicals (APD), Columbia Broadcasting System (CBS), Capital Cities 


See, for example, Ball, Torous, and Tschoegl (1985); Goodhart and Curcio (1990); Harris 
(1091); Niederhotfer (1965, 1966); Niederhoffer and Osborne (1966); and Osborne (1962). 
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АВС (CCB), and Kaneb Services (KAB). The histogram for CBS is a partic- 
ularly good illustration of the classic price-clustering pattern: Prices tend 
toifall more frequently on whole-dollar multiples than on half-dollar mul- 
tiples, more frequently on half-dollars than on quarter-dollars, and more 
frequently on even eighths than on odd cighths. Price-clustering is even 
mere pronounced for transactions data. 

The importance of these patterns of discreteness has been highlighted 
by| the recent controversy and litigation surrounding the publication of 
two empirical studies by Christie and Schultz (1994) and Christie, Harris, 
and Schultz (1994). They argue that the tendency for bid-ask quotes on 
NASDAQ stocks to cluster more frequently on even cighths than on odd 
eighths is an indication of tacit collusion among NASDAQ dealers to main- 
taih wider spreads. Of course, there are important differences between the 
NASDAQ's market structure and those of other organized exchanges, and 
morc detailed analysis is required to determine if such differences can ex- 
pldin the empirical regularities documented by Christie and Schultz (1994) 
and Christie, Harris, and Schultz (1994). Although the outcome of this 
controversy is yct to be decided, all parties concerned would agree that 
discreteness can have a tremendous impact on securities markets.. 


Discreteness and Returns А 
Тһе empirical relevance of discreteness for returns depends to a large extent 
on the holding period and the price level, for reasons that we shall discuss 
below. For transactions data, discretencss is considerably more problematic 
because the price change from one transaction to the next is typically only 
onc or two ticks. For example, if the minimum price variation is an eighth of 
a dollar, a stock currently priced at $10 a share can never yield a transaction 
return between zero and £1.25%. In fact, in this case, the transaction return 
must fall on a discrete “grid” of integer multiples of 1.25%. For higher- 
priced stocks, this grid is considerably finer. For example, the transactions 
return for a $50 stock will fall on a grid of integer multiples of 0.25%. 
Morcovct, as the price level varies through time, the collection of transaction 
returns obtained may seem less discrete because the grid corresponding 
to the entire dataset will be the superposition of the grids at cach price 
level. Therefore, if price levels are high and volatile, or if the timespan of 
the dataset is long (which implies higher price-variability under a random 
walk model for prices), the discreteness of transaction returns will be less 
apparent. 

Table 3.2 contains a concrete example of this intuition. It reports the 
relative frequencies of transaction price changes for the five stocks in Fig- 


9 Other contributions to the NASDAQ controversy include Chan, Christie, and Schultz 
(1995), Furbush and Smith (1976), Godek (1996), Grossman, Miller, Fischel, Cone, and Ross 
(1995), Huang and Stoll (1995b), Randel and Marx (1996), and Kleidon and Willig (1995). 
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Figure 3.2. Histogram of Daily Price Fractions and Price Changes for Five NYSE 
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Table J. 2. Relative frequencies of price changes for tick data of five stocks. 


Numbe 
Stock Come. ушр SR сз ol 0 4 + +з >44 
of Trades 


AAG 18,056 0.02. ODS 0.7. 12.44. 74.54. 12.58 0,18. OO 0.18 
APD 20,905 0.32 OFF 3.22 13.48 64.40. 14.23 BAA 0.4! 0.39 
CBS 24,315 ола 661 7.35 7.26 52.42 7.93 742 6.31 2.43 
CCR 253.128 15.72 0.70 LOO 3.90 55.11 456 189 0.58 15.85 


КАВ 21,008 0.00 0.00 DJG 11.77 75.79 12.04 0.15 0.00 0.07 


Relative frequency count, ii percem, tor all 1991 transaction price changes in ticks tor five 
NYSE stacks: AACzAnaceimp; APD=Air Products and Chemicals; CBSsColumbia Broadcasting 
System; CCB- Capital Cities ABC; RABs Kaneb Services. 


ure 3.2 using all of the stocks! transactions during the 1901 calendar year, 
The lower piiced stoc ks— КАВ and AAC—have very few transaction price 
changes bevond the -l tick to +t tick range; these three values account 
for 99.6% and 99.3% of all the trades for KAB aud ЛАС, respectively, In 
contrast, foc a higher-priced stock like ССВ, with an average price of $468 
during 1991, the range ftom = tick to +1 tick accounts for 63.0% o its 
trades. While discreteness is relatively less pronounced for CCB, itis never 
theless still present. Even when we turn to daily data, the histograms of daily 
price changes in Figure 3.2b show that discreteness can still be important, 
especially for lower-priced stocks such as КАВ and ЛАС. 

Moreover, discreteness may be more evident in the conditional and joint 
distribution of high frequency returns, even if it is difficult to detect in the 
unconditional ov marginal distributions, For example, consider the graphs 
in Figure 3.3a in which pairs of adjacent daily simple returns (Ду, Дор) are 
plotted for each of the five stocks in Table 3.1 over the three-year sample 
period, These ne histories (here, m = 2) are often used to detect structure in 
nonlinear dynamical systems (see Chapter 12). The scales of the two axes 
ace identical for all five stocks to make cross-stock comparisons meaning ful, 
and range from ~5% to 576 in Figure 3.3a, - 10% to 10% in Figure 3.3b, 
and —20% to 20% in Figure 3.30. 

Figure BBa shows that there is considerable structure in the returns of 
the lower-priced stocks, KAB and AAC; this is a radially symmetric structure 
that is solely attributable to discreteness, In contrast, no structure is evident 
in the 2-histories of the higher-priced stocks, CBS and CCB. Since APD's 
initial price is in between those of the other four stocks, it displays less struc- 
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2-History of KAB Returns, P = $4.665 
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Figure 3.3. 2-Histories of Daily Stock Returns for Five NYSE Stocks from January 2, 1990 
lo December 31, 1992 


ture than the lower-priced stocks but more than the higher-priced stocks. 
Figures 3.3 and 3.3c show that changing the scale of the plots can often 
reduce and, in the case of APD, completely obscure the regularities associ- 
atec with discreteness. For further discussion of these 2-histories, see Geek 
and Ledoit (1996). 


114 | | J. Market Microstructure 


These empirical observations have motivated several explicit models of 
rice discreteness, and we shall discuss the strengths and weaknesses of cach 
of these models in the following sections. 


2.2.2 Rounding and Barrier Models 


everal models of price discreteness begin with a "true" but unobserved 
cpntinuous-state price process N, and obtain the observed price process P 
by discretizing Р, in some fashion (sce, for example, Ball [1988], Cho and 
Frees [1988], and Gottlicb and Kalay [1985]). Although this may be a conve- 
nient starting point, the use of the term "true" price for the continuous-state 
price process in this literature is an unfortunate choice of terminology—it 
implies that the discrete observed price is an approximation to the true price 
when, in fact, the reverse is true: continuous-state models are approxima- 
tions to actual market prices which are discrete. When the approximation 
errors inherent ín continuoussstate models are neglected, this can yield mis- 
leading inferences, especially for transactions data.“ 


Rounding Errors 
To formalize this notion of approximation error, denote by X, the gross 
return of the continuous-state process P; between =I and t, i.c, X = 
/ PEI. We shall measure the impact of discreteness by comparing X, to 
the gross returns process X? = P?/P? у corresponding to a discretized price 
process PF. 
The most common method of discretizing Р, is to round it to a multiple 

of d, the minimum price variation increment. To formalize this, we shall 
require the floor and ceiling functions 


lx] = greatest integer < x (floor funcüon) (3.3.1) 


[x] s leastinteger > x (ceiling function), (3.3.2) 


for any real number x. Using (3.3.1) and (3.3.2), we can express the three 
most common methods of discretizing 7, compactly as 


P, 
. 4 (3.3.3) 


ne question of which price is the "true" price may not be crucial for the statistical aspects 
of lodels of discreteness—atter all, whether one is an approximation to the other or vice-versa 
аер only the sign of the approximation error, not its absolute magnitude—but it is central 
to the motivation and interpretation of the results (see the discussion at the end of Section 
3.3.2 for examples). Therefore, although we shall adopt the terminology of this Literature fot 
the moment, the reader is asked to keep this ambiguity in mind while reading this section, 

por further properties and applications of these integer functions, see Graham, Knuth, 
and'Patashnik (1989, Chapter 3). 


| 
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D S 2 AC : 
ре = ЕЗ 7 (3.3.4) 


^ 1 
РР = | s +5 | d, | 63.3.5) 


where the first method rounds down, the second rounds uf, and the third 
rounds to the nearest multiple of d. For simplicity, we shall consider only 
(3.3.3), although our analysis easily extends to the other two methods. 

Al the heart of the discreteness issue is the difference between the return 
X; based on continuous-state prices and the return X? based on discretized 
prices. To develop a sense of just how different these two returns can be, 
we shall construct an upper bound for the quantity |X? — X, = [RP — Ril, 
where А, and R? denote the simple net return of the continuous-slate and 
discretized price processes, respectively. Let x and y be any two arbitrary 


nonnegative real numbers such that y > 1, and observe that 
1 (3.3.6) 
y 12] y-i 
Subtracting x/y froin (3.3.6) then yields 
1 j x : 
Ea БЕ E ee (3.3.7) 
y ly} 7 xy- 1) 
which implies the inequality 
] x 
A Ed s Man | L^ |. (3.3.8) 
D] ر‎ y ye 


Assuming that P, > d for all t, we may set x +в 1% d. y = I/ and substitute 
these expressions into (3.3.8) to obtain the following upper bound: 
8,1 


1 -= — 


i Max l X, 1-8.1] = LX, 5-3). (3.3.9) 
= dtl 


where б 1 = d/P,-; is defined to be the grid size at time 1—1. 

Although the upper bound (3.3.9) is a strict inequality, it is in fact the 
least upper bound, i.c., for any fixed d and any € > 0, there always exists some 
combination of Pj; and X, for which {RF - I| exceeds L(8, X., 11) — €. 
Therefore, (3.3.9) measures the worst-case deviation of I? from fü, and it 
is the Ughtest of all such measures. 

Note that (3.3.9) does not yield a uniform upper bound in 5, since J. 
depends on n: 


ô,- 
LX ö i) = уш Max I,. - . (3.3.10) 
— 94-1 
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Nevertheless, it still provides a useful guideline for the impact of discreteness 
on returns as prices and returns vary. For example, (3.3.9) formalizes the 
intuition that discreteness is less problematic for higher-priced stocks, since 
L is an increasing function of &, у and, therefore, a decreasing function 
of P, |. 

It is important to keep in mind that (3.3.9) is only an upper bound, 
and while it does provide a measure of the worst-case discrepancy between 
R, and Jf, itis not a measure of the discrepancy икем This distinction is 
best understood by grappling with the fact that the expected upper bound 
EU LX, 8, 4)]5, у fisaninereasing function of the mean and variance of Xx 
the larger the expected return and volatility, the larger is the average value of 
the upper bound, This seems paradoxical because it is generally presumed 
that disereteness is less problematic for longer-horizon returns, but these 
have higher means and variances by construction. The paradox is readily 
resolved by observing that although the expected upper bound increases 
as the mean and variance increase, the probability mass of [RY — R,| near 
the upper bound may actually decline. Therefore, although the expected 
worst-case discrepancy increases with the mean and variance, the probability 
that such discrepancies are realized is smaller. Also, as we shall see below, 
the expected upper bound seems to be relatively insensitive to changes in 
the mean and variance of X,, so that when measured as a percentage of the 
expected return EEN, |, the expected upper bound does decline for longer- 
horizon returns, 


By specifying a particular process for Pj, we can evaluate the expectation 
of LC) to develop some sense for the magnitudes of expected disereteness 
bias ЕЛИ — А that are possible. For example, let Р, follow a geometric 
random walk with drift y and diffusion coefficient о so that log P/P,- are 


; . ; 2 ; 
HD normal random variables with mean д and variance o^. In this case, we 
have 


Wea X.. I O ДОР, | 


= 285 9 5 ( log(1—8) — u — a` ) 
о 


(аи (3.3.11) 
F 


where Ф.) is the normal CDE 


MH deally, we would like to Characterize % = Ri directly, bun it is surprisingly difficult to do 
so with any берес ol peter ality, However see the discussion below сери нд the rounding and 
barrier modelsuander specifie parametric assumptions far X; more precise eharacterizations 
at the discreteness bias are available. 


Note the similarity between (3.3.11) and the Black-Scholes call-option pricing formula, 
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Tables 3.3а—с report numerical values of (3.3.9) for price levels Р; = 
$1, $5, $10, $50, $100, and $200, and for values of и and o corresponding 
wo annual means and standard deviations for simple returns ranging from 
10% to 50% each, respectively, and then rescaled to represent daily returns 
in Table 3.3a, monthly returns in Table 3.3b, and annual returns in Table 
3.3c. 

Table 3.3a shows that for stocks priced at $1, the expected upper bound 
for the discreteness bias is approximately 14 percentage points, a substantial 
bias indeed. However, thisexpected upper bound declines to approximately 
0.25 percentage points for a $50 stock and is a negligible 0.06 percentage 
points for a $200 stock. These upper bounds provide the rationale for 
the empirical examples of Figures 3.3a-c and the common intuition that 
discreteness has less of an impact on higher-priced stocks. Table 3.3a also 
shows that for daily returns, changes in the mean and standard deviation of 
returns have relatively little impact on the magnitudes of the upper bounds. 

Tables 3.3b and 3.3c indicate that the potential magnitudes of discrete- 
ness bias are relatively stable, increasing only slightly as the return-horizon 
increases. Whereas the expected upper bound is about 2.5 percentage 
points for daily returns when Ру = $5, it ranges from 2.8% to 3.9% for 
annual returns. This implies that as a fraction of the typical holding period 
return, discreteness bias is much less important as the return horizon in- 
creases. Not surprisingly, changes in the mean and standard deviation of 
returns have more impact with an annual return-horizon. 


! 
Rounding Models | 
Even if ЦА? — А) is small, the statistical properties of P? can still differ in 
subtle but important ways from those of P;. If discretenessi isan unavoidable 
aspect of the data at hand, it may be necessary to consider a more explicit 
statistical model of the discrete price process. As we suggested above, a 
rounding model can allow us to infer the parameters of the continuous-state 
process from observations of the rounded process. In particular, in much of 
the rounding literature it is assumed that P, follows a geometric Brownian 
motion dP = ЫР + о РАМ, and the goal is to estimate p and с бот 
the observed price process Ро. Clearly, the standard volatility estimator û 
based on continuously compounded observed returns will be an inconsistent 


estimator of e, converging in probability to /E[(log Ру - log Р?)?] rather 
than to /E[(log i- log Р,)?]. Moreover, it can be shown that & will be 
an overestimate of a. in the presence of price-discreteness (see Ball [1988, 
Table 1] and Gottlieb and Kalay [1985, Table I] for approximate magnitudes 
of this upward bias). Ball (1988), Cho and Frees (1988), Gottlieb and Kalay 


This is no accident, since Max IX,. 1—5] may be rewritten as Max[ X; — (1—5),0] + 1-8; hence 
the upper bound may be recast as the payoff of a call option on X, with strike price l. 


Table 3.3a. Expected upper bounds for discreteness bias: daily return 


m s= 10% s= 20% s = 30% s = 40% s= 50%, 
Pry = $1 E Т. dee d 
10% 14.2895 14.2895 14.2895 14.2895 14.2805 
20% 14.2930 14.2930 14.2930 14.2930 14.2030 
3096 14.2961 14.2961 14.2961 14.2961 14.2961 
4096 14.2991 14.2991 14.2991 14.2991 14,2991 
50% 14.3018 14.3018 14.3018 14.3018 14,3018 
Т BN Уз сл E E o oe 
10% 2.5648 2.5650 2.5676 2.5721 2.5772 
20% 2.5654 2.5655 2.5672 2.5709 2.5755 
3095 2.5660 2.5660 2.5671 2.5701 2.5741 
40% 2.5665 2.5665 2.5672 2.5605 2.5730 
50% 2.5670 2.5670 2.5674 2.5692 2.5721 
Р, = $50 
10% 0.2511 0.2516 0.2520 0.2525 0.2529 
20% 0.2511 0.2515 0.2520 0.2524 0.2528 
30% 0.2511 0.2515 0.2519 0.2523 0.2527 
40% 0.2511 0.2515 0.2518 0.2522 0.2526 
5096 0.2511 0.2514 0.2518 0.2521 0.2525 
7) ا ع ا ا ا ج‎ шукы ы - 
10% 0.1254 0.1256 0.1259 0.1261 0.1263 
20% 0.1254 0.1256 0.1258 0.1260 0.1262 
3096 0.1254 0.1256 0.1258 0.1260 0.1262 
4095 0.1254 0.1256 0.1258 0.1260 0.1261 
50% 0.1254 0.1256 0.1257 0.1259 0.1261 
ا‎ Be T LSU UA vum s 
096 0.0627 0.0628 0.0629 0.0630 0.0631 
0% 0.0627 0.0628 0.0629 0.0630 0.0631 
500% 0.0627 0.0628 0.0629 0.0630 0.0631 
40% 0.0627 0.0628 0.0628 0.0629 0.0630 
50% 0.0627 0.0628 0.0628 0.0629 0.0630 


—pLE———————————————— 


Expected upper bounds for discreteness bias in simple returns |R? — J| x 100 under a geometric 
raridom walk for prices P, with drift and ditfusion parameters p and о calibrated to annual mean 
and standard deviation of simple returns m and s, respectively, each ranging from 10% to 50%, 


an then rescaled to match daily data, i. e., 1/360, a/ V360. Discretized prices P? = |Р, 


d = 0.125, are used to calculate returis RY = Ре) -]. 


Table J. Jb. 


Expected upper bounds [or discreteness bias: monthly returns, 


m „ 10% у= 20 s= 30% 1 mo *. — 5096 
Pea = 81 E M DE TERES T 
10% 14.3996 14.4007 14.4788 14.6117 14.7626 
20% 14.5044 14.5064 14.5462 14.6449 14.7723 
3096 14,6015 14.6019 14.6219 14.6907 14.7944 
40% 14,6919 14.6920 14.7011 14.7462 14.8272 
50% 14.7767 14.7767 14.7804 14.8081 14.8688 
hz o == сес 
10% 2.5045 2.6228 2.6501 2.0759 2.7004 
20% 2.6075 2.6300 2.6545 2.6782 2.7010 
30% 2.6222 2.6385 2.6599 2.6816 2.7027 
10% 2.6374 2.6482 2.6664 2.6859 2.7053 
50% 2.6523 2.6589 2.6738 2.6911 2.7088 
10% 0.2544 0.2569 0.2594 0.2619 0.2642 
20% 0.2554 0.2576 0.2599 0.2621 0.2643 
30% 0.2566 0.2584 0.2604 0.2624 0.2645 
4096 0.2580 0.2593 0.2610 0.2620 0.2647 
50% (0.2593 0.2602 0.2617 0.2634 0.2651 
Pa = $100 E m ы ел сы: t 
10% 0.1270 0.1283 0.1296 0.1308 0.1319 
20% 0.1276 0.1286 0.1208 0.1309 0.1320 
30% 0.1282 0.1290 0.1300 0.1311 0.1321 
40% 0.1288 0.1295 0.1304 0.1313 0.1322 
50% 0.1295 0.1300 0.1307 0.1315 0.1324 
Da = $200 " m Ll nm 
10% 0.0635 0.0641 0.0647 0.0653 0.0659 
20% 0.0637 0.0643 0.0648 0.0654 0.0659 
30% 0.0640 0.0645 0.0650 0.0655 0.0660 
40% 0.0644 0.0647 0.0651 0.0656 0.0661 
50% 0.0647 0.0649 0.0653 0.0657 0.0661 


Expected upper bounds for discreteness bias in simple returns [Ду Rp x 100 under a geomeuic 
random walk for prices P with drift and diffusion parameters и ando calibrated to annual mean 
and standard deviation of simple returns m and s, respectively, cach ranging irom 10% to 50%, 
and then rescaled to match monthly data. ie., 2/12, % M. Discieuzed prices Р = i. 
d = 0,125, are used to calculate returns М = (07/17 0-1 


—— — — 


Table J. Jr. 


т у= 10% y Эр, у= 30% s = ЮФ (m 50 
am $l ЕСТЕ J a E aa 
1096 15.7285 16.0498 16.5424 17.047 " 17.5247 
20%, 17.1430 17.2207 17.5320 17.9288 18.3478 
30% 18.5714 18.5857 18.7221 18.9889 19.3203 
10%. 20.0000 20.0014 20.0464 20.1957 20.4209 
nay, 21.1286 21.4286 21.4396 21.5080 21.6541 
Mam o 25. 5 z 
10% 2.8372 2 29105 2.9058 3.0815 3.1644 
20% 3.0778 3.1076 3.1677 3.2885 3.3118 
30% КААДА: 3.3407 3.3736 3.4248 3,4840 
10% 3.5807 3.5909 3.6046 3.0304 3.6807 
50%, 3.8402 53.8403 3.8506 3.8673 3.8008 
44 = $50 5. PR : | 
10% 0.2775 0.2846 0.2920 0.35015 0.30094 
204. 0,000 0.3030 0.3097 0.3166 0,3238 
53095 0.3258 0.3206 0.3208 0.3348 0.3407 
0% 0.3501 0.4510 0.3524 0.3555 0,3508 
50% 0.3759 0.3760 0.3764 0.3780 0.3809 
Baa$o ADF Sh E ia E E Ke RLS S 
10% 0.1386 0.1421 0.1463 0.1505 0.1545 
2095 0.1502 0.1517 0.1547 0.158] 0.1617 
30% 0.1627 0.1631 0.1647 0.1672 0.1701 
40% 0.1752 0.1753 0.1760 0.1775 0.1707 
HOG 0.1877 0.1877 0.1880 0. 1888 0.1902 
Р, ‚= $200 . 0 S 353 Ede 
1604. 0.000? ü.0710 0.0731 0.0759 0.0772 
20% 0.0751 0.0758 0.0773 0.0790 0.0808 
304, 0.0813 0.0815 0.0823 0.0836 0.0850 
40% 0,0870 0.0876 0,0879 0,0887 0.0898 
50% ПАША}! 0.0038 0.0039 0.00435 0.0051 


lxpeced upper hounds for discreteness bias: annual returns. 


— —ͤ—œ а 


Expected upper bounds foi discieteness bias in simple returns [A7 — H4 x 100 under geometric 
l n i 

vandom walk fos prices Рул а Имон parameters p ando calibrated to annuale 
and standard deviation ol simple returns m and s, respectively, each ranging from 10% to 50%, 


Disc etized prices Гү = (ly fd dd, d . 125, аге used to calculate returns Ар = (РР! 1) l. 
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(1985), and Harris (1990) all provide methods for estimating o consistently 
from the observed price process PP. 


Barrier Models 

A slightly different but closely related set of models of price discreteness has 
been proposed by Cho and Frees (1988) and Marsh and Rosenfeld (1986) 
which we shall call barrier models. In these models, the continuousstate 
“irune” price process P, is also a continuous-time process, and trades аге 
observed whenever P, reaches certain levels or barriers. 

Marsh and Rosenfeld (1986) place these barriers at шшир!ев of an 
eighth, so that conditional on the most recent trade at, say 402, the waiting 
time until the next trade is the first passage time of P to two banners, one 
at 402 and the other at 403 (assuming that P, has positive drift). 

Cho and Frees (1988) focus on gross returns instead of prices and define 
stopping times т, as 


P, 1 
a = i f el. Soe. 1 d . 3.3.12 
MuR ыы н Ge bi )| мо 


Therefore, according to their model a stock which has just traded at time 
ти- at $10.000 a share will trade next at time т, when the unobserved 
continuousstate gross returns process Р,/$10.000 reaches either 1.125 or 
1/1.125, or when P, reaches either $10.125 or $8.888. If P, reaches $8.888, 
tlie stock will trade next when P, reaches either $10.000 or $7.901, and so 
on. А 

This process captures price-discreteness of a very different nature since 
the price increments defined by the stopping times are not integer multi- 
ples of any fixed quantity (for example, the lower barrier 1/1.125 does not 
correspond toa one-eighth price decline). However, such an unnatural def- 
inition of discreteness does greatly simplify the characterization of stopping 
times and the estimation of the parameters of P,, since the first-difference 
of t, is HD. | 

Under the more natural specification of price discreteness, not corisid- 
cred by Cho aud Frees (1988), the stopping time becomes 


! | 


р =з кч (: рае )} (3.3.13) 
T, =m > uult کے‎ = ۰ 5.5.1. 
* e Pa Pirama) 


which reduces to the Marsh and Rosenfeld (1986) model in which the in- 


crements of stopping times are not IID. i 


D 


“However, see the discussion at the end of Section 3.3.2 for some caveats about the moti- 
vation far these models. 


be 
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Limitations 


Although all of the previous rounding and barrier models do capture price 
discreteness and admit consistent estimators of the parameters of the unob- 
served continuous-state price process, they suffer from at least three impor- 
tant limitations. 

First, for unobserved price processes other than geometric Brownian 
motion, these models and their corresponding parameter estimators be- 
соте intractable. 

Second, the rounding and barricr models focus exclusively on prices 
and allow no role for other economic variables that might influence price 
behavior, e.g., bid-ask spreads, volatility, trading volume, ctc. 

Third, and most importantly, the distinction between the “true” and 
observed price is artificial at best, and the economic interpretation of the 
two quantities is unclear. For example, Ball (1988), Cho and Frees (1988), 
Gottlieb and Kalay (1985), and Harris (1990) all provide methods for es- 
timating the volatility of a continuous-time price process from discrete ob- 
served prices, never questioning the motivation of this arduous task. If che 
continuous-time price process is an approximation to actual market prices, 
why is the volatility of the approximating process of interest? One might 
argue that derivative pricing models such as the Black-Scholes/Merton for- 
mulas depend on the paramcters of such continuous-time processes, but 
thos models are also approximations to market prices, prices which ex- 
hibit discreteness as well. Therefore, a case must be made for the economic 
relevance of the parameters of continuous-state price processes to properly 
motivate the statistical models of discreteness in Section 3.3.2. 

In the absence of a well-articulated model of "true" price, it seems un- 
Poids to argue that the "true" price is continuous, implying that observed 
discrete market prices are somchow less genuinc. After all, the cconomic 
definition of price is that quantity of numeraire at which two mutually con- 
senting economic agents are willing to consummate a trade. Despite the fact 
that institutional restrictions may require prices to fall on discrete valucs, 
as long as both buyers and sellers are aware of this discreteness in advance 
and arc still willing to engage in trade, then discrete prices corresponding 
to market trades аге "true" prices in every sense. 


| 3.3.3 The Ordered Probit Model 


To address the limitations of the rounding and barrier models, Flausman, 
Lo, and MacKinlay (1992) propose an alternative in which price changes are 
modeled directly using a statistical model known as ordered probit, a technique 
used niost frequently in empirical studies of dependent variables that take on 
only a finite number of values possessing a natural ordering? Heuristically, 


“For example, the dependent variable might be the level of education, as measured by 
thee categories: less than high school, high school, and college education. The dependent 
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ordered probit analysis is a generalization of the lincar regression model 
to cases where the dependent variable is discrete. As such, among the 
existing models of stock price discreteness—e.g., Ball (1988), Cho and Frees 
(1988), Gottlieb and Kalay (1985), Harris (1990), and Marsh and Rosenfeld 
(1986)—ordered probit is the only specification that can casily capture the 
impact of “explanatory” variables on price changes while also accounting 
for price discreteness and irregular transaction intervals, 


The Basic Specification 

Specifically, consider a sequence of transaction prices P(o), (11) .. P(ty) 
sampled at umes lp, 4, ... , ly, and denote by Y, Y», . . , Y, the correspond- 
ing price changes, where Y, = P(t) — Ра) is assumed to be an integer 
multiple of some divisor, c.g, a tick. Let Y? denote an unobservable con- 
tintious random variable such that 


Yo = Xie, EledX,] = 0, €, INID V. о), (3.3.14) 


where the (qx I) vector X, = | Ар... Хи ]' is a vector of explanatory 
variables that determines the conditional mean of Y? and "INID" indicates 
that the e,'s are independently but not identically distributed, an impor- 
tant difference from standard econometric models which we shall return to 
shortly. Note that subscripts are used to denote fransaction time, whereas 
time arguments 4 denote calendar or clock time, a convention we shall follow 
throughout Section 3.3.3. 

The heart of the ordered probit model is the assumption that observed 
price changes Y, are related to the continuous variables Y? in the following 
manner: 

SI if AL Є M 
so il % & 42 
Ү, = (3.3.15) 


sw d VES Ap, 


where the sets A, form a partition of the state space S* of M. i. e., S* = ut Aj 
and A; П Aj = И for i x j, and the sj's are the discrete values that comprise 
the state space & of MN. 

The motivation for the ordered probit specification is to uncover the 
mapping between S* and S and relate it to азе of economic variables. In 


Hausman, Lo, and MacKinlay (1992), the s's are defined as: 0, ~}, +}, 


variable is discrete and is naturally ordered since college education always follows high school 
(see Maddala [1983} for further details). The ordered probit model was developed by Aitchison 
and Silvey (1957) and Ashford (1959), and generalized to nonnormal disturbances by Gurland, 
Lee, and Dahm (1960). For more recent extensions, sce Maddala (1983), McCullagh (1980), 
and Thisted (1991), 


^ 


— . — 
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y 2 x H a» M ^ . 
gei and so on. For simplicity, the state-space partition of &“ is usually 
defined to be intervals: 


A = oo, an! (3.3.16) 
A» = (a), a] (3.3.17) 
A = (ар, ,] (3.3.18) 
A, E (Qu * oo). (3.3.19) 


Although the observed price change can be any number of ticks, posi- 
live or negative, we assume that m in (3.3.15) is finite to keep the number of 
unknown parameters finite, This poses no difficulties since we may always 
let some states in S represent a multiple (and possibly countably infinite) 
number of values for the observed price change. For example, in the empir- 
ical application of Hausman, Lo, and MacKinlay (1992), s is defined to be 
a price change of —4 ticks or less, sy to be a price change of +4 ticks or more, 
and s» to & to be price changes of —3 ticks to -+3 ticks, respectively. This 
parsimony is obtained at the cost of losing price resolution, That is, under 
this specification the ordered probit model does not distinguish between 
price changes of -+4 ind price changes greater than +4, since the +4tick 
outcome and the greater than 4-4-tick outcome have been grouped together 
into a common event. The same is true for price changes of —4 ticks and 
price changes less than ~4. This partitioning is illustrated in Figure 3.4 
which superimposes the partition boundaries (a;} on the density function 
of Y? and the sizes of the regions enclosed by the partitions determine the 
probabilities л, of the discrete events. 

Moreover, in principle the resolution may be made arbitrarily finer by 
simply introducing more states, i.e., by increasing m. As long as (3.3.14) is 
correctly specified, increasing price resolution will not affect the estimated 
В asymptotically (although finitesample properties may differ). However, 
in practice the data will impose a limit on the fineness of price resolution 
simply because there will be no observations in the extreme states when m 
is too large, in which case à subset of the parameters is not identified and 
cannot be estimated. 


The Conditional Distribution of Price Changes 

Observe that the ez's in (3.3.44) are assumed to be nonidemically dis- 
tributed, conditioned on the Xz S. The need for this somewhat nonstandard 
assumption comes [rom the irregular and random spacing of transactions 
dita. ТЕ, loc example, transaction prices were determined by the model in 
Marsh and Rosculeld (1986) where the YEs are increments of arithmetic 
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Figure 3.4. The Ordered Probit Model 


Brownian motion with variance proportional to Ak = f, — fii, of must be 
a linear function of At, which varies from one transaction to the next. 

More generally, to allow for more general forms of conditional het- 
eroskedasticity, let us assume that of is a linear function of a vector of pre- 
determined variables W. = [ Wı, .. Wr, ]' so that 


Ele, Wal = 0, „ INIDA(, o) (3.3.20) 
of = ур+уүйй+ E. (3.3420) 


where (3.3.20) replaces the corresponding hypothesis in (3.3.14) and the 
conditional volatility coefficients {уу} are squared in (3.3.21) to ensure that 
the conditional volatility is nonnegative. In this more general framework, 
the arithmetic Brownian motion model of Marsh and Rosenfeld (1986) can 
be easily accommodated by setting 


X. = HA (3.3.22) 
оў = у?'М. . x 


In this case, М, contains only one variable, At, (which is also the only 
variable contained in Ху). The fact that the same variable is included in 
both X, and W, does not create perfect multicollinearity since one vector 


affects the conditional mean of Y? while the other affects the conditional 
variance. 
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The dependence structure of the observed process Y, is clearly induced 
by that of Y? and the definitions of the Aj's, since 


PO = 55 N = „) = PY} є A, M., € A.). (3.3.24) 


As a consequence, if X, and W, are temporally independent, the observed 
process Y, is also temporally independent. Of course, these are fairly re- 
strictive assumptions and are certainly not necessary for any of the statistical 
inferences that follow. We require only that the e,'s be conditionally inde- 
pendent, so that all serial dependence is captured by the X,'s and tlie W,'s. 
Consequently, the independence of the es does not imply that the Ук are 
independently distributed because no restrictions have been placed on the 
temporal dependence of the X,'s or WIS. 

ү The conditional distribution of observed price changes . conditioned 
on X, and W,, is determined by the partition boundarics and the particular 
distribution of eg. For normal ex's, the conditional distribution is 


\ 
PO, = NI. We) 
| = P(X, B е, € А, XI. W.) (3.3.25) 
| РОХИ +e, < ay XI, W.) if ist 
= {Р(ө р < KO +e < о, X, W) if l<i<m (3.3.26) 
Pani < X ex | Xu. W.) if ism 
| А 
(age) if i=l 
о,—Х,{3 u, 1-х 215 А NC 
= o( SN) — (а) i b<i<m (3.3.27) 
l- o (288. ) if iom, 


where W.) is written as an argument of W; to show how the conditioning 
variables enter the conditional distribution, and & () is the standard normal 
cumulative distribution function. 

To develop some intuition for the ordered probit model, observe that 
the probability of any particular observed price change is determined by 
where the conditional mean lies relative to the partition boundaries. There- 
fore, for a given conditional mean X, shifting the boundaries will alter 
the probahilities of observing cach state (sce Figure 3.4). 

In fact, by shifting the boundaries appropriately, ordered probit can fit 
any arbitrary multinomial distribution. This implies that the assumption of 
normality underlying ordered probit plays no special role in determining the 
probabilities of states; a logistic distribution, for example, could have served 
equally well. However, since it is considerably more difficult to capture 
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conditional hetcroskedasticity in the ordered logit model, we have chosen 
the normal distribution. 

Given the partition boundaries, a higher conditional mean X, implies 
a higher probability of observing a more extreme positive state. Of course, 
the labeling of states is arbitrary, but the ordered probit model makes use 
of the natural ordering of the states. The regressors allow us to separate 
the effects of various economic factors that influence the likelihood of one 
state versus another. For example, suppose that a large positive value of 
X, usually implies a large negative observed price change and vice versa. 
Then the ordered probit coefficient Hi will be negative in sign and large in 
magnitude (relative to од, of course). 

By allowing the data to determine the partition boundaries a, the co- 
efficients 8 of the conditional mean, and the conditional variance оў, thc 
ordered probit model captures the empirical relation between the unob- 
servable continuous state space S“ and the observed discrete state space & 
as a function of the economic variables X, and W. 


Maximum Likelihood Estimation 

Let AUG) be an indicator variable which takes on the value one if the re- 
alization of the kth observation Y, is the äh state . and zero otherwise. 
Then the log-likelihood function С for the vector of price changes Y = 
[Y Ye o Ya J, conditional on the explanatory variables X = 


[Xi X X, and == [Wi Wy . Wa I“, is given by 


n -X 3 
L(YIX.W) = У? fao ge ( 28.) 
k=l 


ШАЛ 
m-l] ' 1 
А а, X G ХЗ 
А a os о (М) O»(W,) 
а„-1 — XB ) ў 
11 — — — Я 3.3.28 
+ hOn) log | o( LS ( 28) 


Although 0; is allowed to vary linearly with Wi, there are some constraints 
that must be placed on the parameters to achieve identification since, for 
example, doubling the a's, the B's, and o, leaves the likelihood unchanged. 
A typical identification assumption is to set yg = 1. We are then teft with 
three issues that must be resolved before estimation is possible: (i) che 
number of states m; (ii) the specification of the regressors XI; and (iii) the 
specification of the conditional variance op 

In choosing m, we must balance price resolution against the practical 
constraint that too large an m will yield no observations in the extreme states 
5j and sw. For example, if we setm to 101 and define the states s; and sim 
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symmetrically ta be price changes of —50 ticks and +50 ticks, respectively, 
we would find no 's among typical NYSE stock transactions falling into 
cither of these states, and it would be impossible to estimate the parameters 
associated with these two states. Perhaps the easiest method for determining 
in is to use the empirical frequency distribution of the dataset as a guide, 
setting on as large as possible, but not so large that the extreme states have 
no observations in them.? 

The remaining two issues must be resolved on a case-by-case basis since 
the specification for the regressors and o} are dictated largely by the par- 
ticular application at hand. For forecasting purposes, lagged price changes 
and market indexes may be appropriate regressors, but for estimating a 
structural model of marketmaker monopoly power, other variables might 
be more appropriate. 


3.4 Recent Empirical Findings 


The empirical market microstructure literature is an extensive one, strad- 
dling both academic and industry publications, and it is difficult if not im- 
possible to provide even a superficial review in a few pages. Instead, we 
shall present three specific market microstructure applications in this sec- 
tion, each in some depth, t0 give readers à more concrete illustration of 
empirical research in this exciting and rapidly growing literature. Section 
34.1 provides an empirical analysis of nonsynchronous trading in which 
the magnitude of the nontrading bias is measured using daily, weekly, and 
monthly stock returns. Section 3.4.2 reviews the empirical analysis of ef- 
fective bid-ask spreads based on the model in Roll (1984). And Section 
5.4.3 presents an application of the ordered probit model to transactions 
data. 


34.1 Nonsynchronous Trading 


Before considering the empirical evidence for nontrading effects we summa- 
rize the qualitative implications of the nontrading model of Section 3.1.1. 
Although many of these implications are consistent with other models of 
nonsynchronous trading, the sharp comparative static results and exposi- 


“For example, Hausman, Lo, and MacRinlay (1092) set m = 9 for the larger stocks, 
implying extreme states of =E ticks or less and F4 ticks or more, and set im 5 for the sinaller 
stocks, implying extreme states of 2 ticks or less and 42 ticks or more. Note thai although 
the definition of states need not be simimcetiic (state y can be —6 ticks or Jess, implying that 
state sy is 42 ticks or mre), the svinimetniv ob the histogram of price changes in their dataset 
suggests a ine id delimition of the y's. 


D 
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tional simplicity are unique to this framework. Under the assumptions of 


Section 3.1.1, the presence of nonsynchronous trading 


1. 


о 


6. 


does not affect the mean of either observed individual or portfolio re- 
turns. 


. increases the variance of observed individual security returns that have 


nonzero means, The smaller the mean, the smaller the increase in the 
variance of observed returns. 


. decreases the variance of observed portfolio returns when portfolios 


are well-diversified and consist of securities with common nontrading 
probability. 


. induces geometrically declining negative serial correlation in observed 


individual security returns that have nonzero means. The smaller the 
absolute value of the mean, the closer is the autocorrelation to zero. 
induces geometrically declining positive serial correlation in observed 
portfolio returns when portfolios are well-diversified and consist of se- 
curities with a common nontrading probability, yielding an AR(1) for 
the observed returns process. 

induces geometrically declining cross-autocorrelation between observed 
returns of securities i and j which is of the same sign as B. Bj. This 
cross-autocorrelation is generally asymmetric: The covariance of current 
observed returns to i with future observed returns to j need not be the 
same as the covariance of current observed returns to j with future ob- 
served returns to i. The asymmetry arises from the fact that different 
securities may have different nontrading probabilities. 


. induces geometrically declining positive cross-autocorrelation between 


observed returns of portfolios A and B when portfolios are well-diver- 
sified and consist of securities with common nontrading probabilities. 
This cross-autocorrelation is also asymmetric and arises from the fact 


that securities in different portfolios may have different nontrading 
probabilities. 


. induces positive serial dependence in an equal-weighted index if the 


betas of the securities are generally of the same sign, and if individual 
returns have small means. 


. and time aggregation increases the maximal nontrading-induced neg- 


ative autocorrelation in observed individual security returns, but this 
maximal negative autocorrelation is attained at nontrading probabili- 
ties increasingly closer to unity as the degree of aggregation increases.; 


. and time aggregation decreases the nontrading-induced autocorrela- 


tion in observed portfolio returns for all nontrading probabilities. | 


Since the effects of nonsynchronous trading are more apparent in ѕе-: 


curities grouped by nontrading probabilities than in individual stocks, our: 
empirical application uses the returns of ten size-sorted portfolios for daily, 
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weekly, and monthly data from 1962 to 1994. We use market capitaliza- 
tion to group securities because the relative thinness of the market for 
any given stock is highly correlated with the stock's total market value; 
hance stocks with similar market values are likely to have similar nontrading 
probabilities. We choose to form ten portfolios to maximize the homo- 
géneity of nontrading probabilities within cach portfolio while still main- 


tajning reasonable diversification so that the asymptotic approximation of 
(3.1.20) might still obtain.“ 


Da ily Nontrading Probabilities Implicit in Autocorrelations 
Table 3.4 reports first-order autocorrelation matrices Гу for the vector of 
four of the ten size-sorted portfolio returns using daily, weckly, and monthly 
data taken from the Center for Research in Security Prices (CRSP) database. 
Pdrtfolio I contains stocks with the smallest market values and portfolio 10 
contains those with the largest! From casual inspection it is apparent 
that these autocorrelation matrices are not symmetric. The second column 
of matrices is the autocorrelation matrices minus their transposes, aud it 
is evident that elements below the diagonal dominate those above it. This 
cabin the lead-lag pattern reported іп Lo and MacKinlay (1990c). 
lisa fact that the returns of large stocks tend to lead those of smaller 
stocks docs suggest that nonsynchronous trading may be a source of cor- 
relition, However, the magnitudes of the autocorrelations for weekly and 
monthly returns imply an implausible level of nontrading. This is most evi- 
dent in Table 3.5, which reports estimates of daily nontrading probabilities 
implicit in the weekly and monthly own-autocorrcelations of Table 3.4. 
For example, using (3.1.40) the daily nontrading probability implicd by 
an estimated weekly autocorrelation of 37% for portfolio 1 is estimated to 
be 71.7%.” Using (3.1.8) we estimate the average time between trades to 


"Only ordinary common shares are included in this analysis. Excluded are American 
Depository Receipts (ADRs) and other specialized securities where using market value to chai- 
acterize nontrading is less meaningful. 

(“The returns io these portfolios are continuously compounded returns of individual simple 
returns arithmetically averaged. We have repeated the correlation analysis for continuously 
compounded returns of portfolios whose values are calculated as unweighted geometric av- 
erages of included securities’ prices. The results for these portfolio returns are practically 
identical to those for the continuously compounded returns of equal-weighted portiolios. 

We report only a subset of four portfolios for the sake of brevity. 

“Standard errors for autocorrelation-based probability and nontrading duration estimates 
are obtained by applying a firs-order Taylor expansion (see Section A.4 of the Appendix) to 
(3.1.8) and (3.1.40) using heteroskedasticity- and autocorrelation-consistent standard errors 
for daily, weekly, and monthly first-order autocorrelation coefficients. These latter standard 
errors are computed by regressing returns on a constant and lagged returns, and using Newey 
and West's (1987) procedure to calculate heteroskedasicity- and autocorretation consistent 
standard errors for the slope coefficient (which is simply the first-order autocorrelation coef 
ficient of returns}. 
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Table 3.4. — Autocorrelation matrices for sizessorted portfolio returns, 


P. Pu i" 
Н 4 7 10 I 4 7 10 
1 39 29 291] 07 1{.00 -.13 —19 —у7 
Daily 4 41 34 .29 411 : 4 i3 00 —.09 —.25 
40 38 (33 15 71 19 09 00 —.19 
10 K .34 36 34 49 10 \ .27 25 AY 00 
Г, 1 
1 4 7 10 1 4 7 10 
17.37 19 12 —.02 1[.00 —.15 —.20 —.26 
weekly 4] 34 21 .15 —.00 ү 4 15 00 —.08 —.19 
71.32 23 17 02 7120 08 00 —.14 
10 K 24 19 15 .01 10 4.26 19 14 .00 
f, P. 2 Ё, 
1 4 7 10 1 4 7 10 


17.21 10 06 01 
Monthly 4 8 1611 о 
| Y 7 30 ло a4 05 
27 18 .14 03 


00 —.20 —.25 —.29 
00 —.08 —.14 
24 08 00 —.09 
14 09 00 


чо» — 
= 


= 
ы 


Sample firscorder autocorrelation matrix T, forthe (Hx 1) subvector | no 5 tiol ofobserved 
returns to ten equal-weighted size-sorted portfolios using daily, weekly, and monthly NYSE- 
AMEX common stock returns data from the CRSP files for the time period July 3, 1962 to 
December 30, 1094, Stocks ate assigned to portfolios annually using the market value at the 
end of the prior year. If this market value is missing the end of year market value is used. IT both 
market values are missing the stock is not included, Only securities with complete daily return 
histories within a given month are included in the daily returns calculations, ris the retin to 
the portfolio containing securities with the smallest market values and rf, is the return to the 
portfolio of securities with the largest. There me approximately equal numbers of securities in 


each portfolio. The entry in the ith row and jth column is the correlation between ni and 5 v 


` m 
To gauge the degree of asymmetry in these autocorrelation matrices, the dilference sT. 
is also reported. 


be 2.5 days! The corresponding daily nontrading probability is 86.6% using 
monthly returns, implying an average nontrading duration of 6.5 days. 

For comparison Table 3.5 also reports estimates of the nontrading prob- 
abilities using daily data and using trade information from the CRSP files, In 
the absence of tine aggregation own-autocorrelations of portfolio returns 
are consistent estimators of nontrading probabilities; thus the entries in the 
column of Table 3.5 labelled “Te % = 1)" are simply taken from the diagonal 
of the autocovariance matrix in Table 3.4. 

For the smaller securities, the point estimates yield plausible nontrading 
durations, but the estimated durations decline only marginally for larger- 
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Table 3.5. Estimates of daily nontrading probabilities. 


к it, n. (/ I) БЕ] nip 5) ЁК} . % m 22) — Elk] 


1 225 ‚94 0.65 717 2.54 866 6.17 
(0.013) (0.026) (0.07) (0.04) (0.42) (0.029) (1.64) 


4 052 мз 0.52 .560 1.28 837 5.12 
(0.004) (0,023) (0.05) (0.050) (0.31) (0.043) (1.61) 


7 010 SN 0.40 197 0.90 819 4.52 
(0.002) (0.016) (0.04) (0,062) (0.25) (0.048) (0.46) 


10 002 188 0.23 139 0.16 .515 1.06 
(0.001) (0.019) (0.033) (0.126) (0.17) (0.461) (1.96) 


Estimates of daily nontrading probabilities implicit in ten weekly and monthly sizesorted port- 
folio return мосом евон», Enaies in die columna labelled FR, " ace averages of the friction of 
securities in portfolin л that did not trade on each trading day, where the average is computed 
over all wading days from July . 1962 to December 30, 1994. Ennies inthe “m, (% = D^ column 
are the fisioider aitoconeliion coefficients of daily portfolio returns, which are consistent 
estimators of daily nontrading probabilities. Entries in the m (g = 5Y and "л, (0 = 22) 
columns are estimates of daily nontrading probabilities obtained trom first-order weekly and 
monthly portolo return autocoireklition coetficients, using the time aggregation relations of 
Section 3.2 (y = n weekly returns and g = 22 for monthly returns since there are 5 and 22 
trading days in a week and a month, respectively), Entries in columns labelled “EL hy are esti- 
mates of the expected number of consecutive days without trading implied by the probability 
estimates in columns to the iminediate left. Standard errors are reported in parentheses: all 
are heteroskedasticity- aud autocorrelation-consistent. 


size portfolios. A duration of nearly one fourth of a day is much too large 
for securities in the largest portfolio. More direct evidence is provided in 
the column labelled f., which reports the average fraction of securities in 
a given portfolio that do not trade during cach trading day. This average 
is computed over all trading days from July 3, 1962 to December 30, 1994 
(8179 observations), Comparing the entries in this column with those in 
the others shows the limitations of nontrading as an explanation for the 
autocorrelations in the data, Nontrading may be responsible for some of 
the time-series properties of stock returns but cannot be the only source of 
autocorrelation. 


Unas intei mation is provided n the CRSP daily files in which the Closing price оа хосу 
iiepoited tobe the перане of the average of the bid and ask prices on days when that security 
did not паде, Standard eios for probability estimates are based on the daily tine series ot 


the traction of nettles. The standard errors are heteroskedasticitye and autocorrelation- 
consistent. 
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Table 3.6. Nontrading-implied weekly index autocorrelations. 


Кыйноо; Implied Index ру (%) Implied Index pi (%) 


(fı = 1, Bin = 1) (51 = 1.5, Bin = 0.5) 
Negative Share Price 1.4 1.8 
Daily Autocorrelation 4.8 5.9 


D 

Implied first-order autocorrelation p; of weekly returns of an equal weighted portfolio of ten 
size-sorted portfolios (which approximates an equal-weighted portfolio of all securities), using 
two different estimators of daily nontrading probabilities for the portfolios: the average fraction 
of negative share prices reported by CRSP, and daily nontrading probabilities implied by first 
order autocorrelations of daily returns. Since the index autocorrelation depends on the betas 
of the ten portfolios, it is computed for two sets of betas, one in which all betas are set to LO 
and another in which the betas decline linearly from fi = 1.5 to Bia = 0.5. The sample weekly 
autocorrelation for an equal-weighted portfolio of the ten portfolios is 0.21. Results are based 
on data from July 3, 1962 to December 30, 1994. 


i 
Nonsynchronous Trading and Index Autocorrelation ; 
Denote by rf, the observed return in period t to an equal-weighted portfolio 


ofall N securities. Its autocovariance and autocorrelation are readily shown 
to be 


| “Г(п)‹ Tn) 
Cov[ rat. faixa] = UN ۰ Corr [r PT zt TT , (3.4.1) 


where Г, is the contemporaneous covariance matrix of v and cis an (N x1) 
vector of ones. If che betas of the securities are generally of the same sign and 
if the mean return of each security is small, then rf, is likely to be positively 
autocorrelated. Alternatively, if the cross-autocovariances are positive and 
dominate the negative own-autocovariances, the equal-weighted index will 
exhibit positive serial dependence. Can this explain Lo and MacKinlay's 
(1988b) strong rejection of the random walk hypothesis for the CRSP weekly 
equal-weighted index, which exhibitsa first-order autocorrelation over 20%? 

With little loss in generality we let N = 10 and consider the equal- 
weighted portfolio of the ten size-sorted portfolios, which is an approxi- 
mately equal-weighted portfolio of all securities. Using (3.1.36) we may 
calculate the weekly autocorrelation of г, induced by particular daily non- 
trading probabilities л; and beta coefficients Bj. To do this, we need to 
select empirically plausible values for л; and B., i = 1,2,..., 10. This is 
done in Table 3.6 using two different methods of estimating the 7;’s and 
two different assumptions for the 5,8 

The first row corresponds to weckly autocorrelations computed with 
the nontrading probabilities obtained from the fractions of negative share 
prices reported by CRSP (see Table 3.5). The first entry, 1.4%, is the frst- 


34 3. Market Microstructure 


order autocorrelation of the weekly equal-weighted index assuming that all 
twenty portfolio betas are 1.0, and the second entry, 1.8%, is computed um 
der the alternative assumption that the betas decline linearly from Ву = 1.5 
for the portfolio of smallest stocks to Bio = 0.5 for the portfolio of thc 
largest. The second row reports similar autocorrelations implied by non- 
dis probabilities estimated from daily autocorrelations using (3.1.41). 

The largest implied first-order autocorrelation for the weekly equal- 
weighted returns index reported in Table 3.6 is only 5.9%. Using direct 
estimates of nontrading via negative share prices yields an autocorrelation of 
less than 2%. These magnitudes are still considerably smaller than the 21% 
sample autocorrelation of the equal-weighted index return. In summary, 
the recent empirical evidence provides litte support for nontrading as an 
important source of spurious correlation in the returns of common stock 
over daily and longer frequencies.“ 


3.4.2 Estimating the Effective Bid-Ask Spread 


In implementing the model of Section 3.2.1, Roll (1984) argues that the 
percentage bid-ask spread s, may be more casily interpreted than the ab- 
solute bid-ask spread s, and he shows that the first-order antocovariance of 
simple returns is related to 5, in the following way: 


32 s 52 
y Rup, R 3.4.2 
Cov[ Л, il 4 16 4 ( ) 
5 


Sr 


lit 


3.4.3 ,س 

у PAP, | 
where s, is defined as a percentage of the geometric average of thc average bid 
and ask prices P, and Py. Using the approximation in (3.4.2), the percentage 
spread may be recovered as 


= Сом, Ri. (3.4.4) 


Note that (3.4.4) and (3.2.9) are only well-defined when the return auto- 
covariance is negative, since by construction the bid-ask bounce can only 
induce negative first-order serial correlation. However, in practice, posi- 
tive serial correlation in returns is not uncommon, and in these cascs, Roll 
simply defines the spread to be (sce footnotes a and b of his Table 1): 


„ = Соу. Н]. (3.4.5) 


“Boudoukh, Richardson, and Whitelaw (1995), Mech (1993) and Sias and Stuks (1904) 
present additional empirical results on nontrading as a source of autocorrelation, While the 
papers do not agree on the level of autocorrelation induced by nontwading, all three papers 
соз ve that nontrading cannot completely account for the observed autocor relations, 


| 
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This convention seems difficult to justify on economic grounds—negative 
spreads are typically associated with marketmaking activity, ie., the provi- 
sion of liquidity, yet this scems to have little connection with the presence of 
positive serial correlation in returns, A morc plausible alternative interpre- 
tation of cases where (3.4.4) is complex-valued is that the Roll (1984) model 
is misspecified and that additional structure must be imposed to account 
for the positive serial correlation (sec, for example, George et al. [1991], 
Glosten and Harris [1988], Huang and Stoll [1995a], and Stoll (1989]). 

Roll estimates the effective spreads of NYSE and AMEX stocks year by 
year using daily returns data from 1963 to 1982, and finds the overall average 
effective spread to be 0.298% for NYSE stocks and 1.74% for AMEX stocks 
(recall that AMEX stocks tend to be lower-priced; hence they ought to have 
larger percentage spreads), However, these figures must be interpreted with 
caution since 24,358 of the 47,414 estimated effective spreads were negative, 
suggesting the presence of substantial specification errors. Perhaps another 
symptom of these specification errors is the fact that estimates of the effective 
spread based on weckly data differ significantly from those based on daily 
dato. Nevertheless, the magnitudes of these effects are clearly important for 
empirical applications of transactions data. 

Glosten and Harris (1988) refine and estimate Glosten's (1987) decom- 
position of the bid-ask spread using transactions data for 250 NYSE stocks 
and conclude that the permanent adverse-selection component is indeed 
present in the data, Stoll (1989) develops a similar decomposition of the 
spread, and using transactions data for National Market System securities 
on the NASDAQ system from October to December of 1984, he concludes 
that 43% of the quoted spread is due to adverse selection, 10% is duc to 
inventory-holding costs, and the remaining 47% is due to orderprocessing 
costs. George, Kaul, and Nimalendran (1991) allow the expected return of 
the unobservable "true" price (Р? in the notation of Section 3.2.1) to vary 
through time, and using daily and weekly data for NYSE and AMEX stocks 
from 1963 to 1985 and NASDAQ stocks and from 1983 to 1987, they obtain a 
much smaller estimate for the portion of the spread attributable to adverse 
sclection—8% to 13%—with the remainder due to order-processing costs, 
and по evidence of inventory Costs. Huang and Stoll (1995a) propose а 
more general model that contaius these other specifications as special cases 
and estimate the components of the spread to be 21% adverse-selection 
costs, 14% inventory-holding costs, and 65% orderprocessing costs using 
1992 transactions data for 19 of the 20 stocks in the Major Market Index. 

The fact that these estimates vary so much across studies makes it dit- 
ficult to regard any single study as conclusive. The differences come from 
two sources: different specifications for the dynamics of the bid-ask spread, 
and the use of different datasets. There is clearly à need for a more detailed 
and comprehensive analysis in which all of these specifications are applied 


„ e (Ode ond uc ue 


to a variety of datasets to gauge the explanatory power and stability of each 
model, 


2.4.3 Transactions Data 


In Hausman, Lo, and Mackinlay (1992), three specific aspects of transac- 
tions data are exiimined using the ordered probit model of Section 3.3.3: 
(1) Does the particular sequence of trades affect the conditional distribution 
of price changes, ea, does the sequence of three price changes Fl, - J. 41 
have the same etfecton the conditional distribution of the next price change 
as the sequence -. F1, F17 (2) Does trade size affect price changes, and 
il so, what is the price impact per unit volume of trade from one transac- 
tion to the next? (3) Does price disereteness matter? In particular, can 
the conditional distribution of price changes be modeled as a simple linear 
regression of price changes on explanatory variables without accounting for 
discreteness at alt? 

To address these three questions, Hausman, Lo, and MacKinlay (1992) 
estimate the ordered. probit model for 1988 transactions data of over a 
hundred stocks. ‘To conserve space, we focus only on their smaller and 
more detailed sample of sis stocks—International Business Machines Cor- 
poration (IBM), Quantum Chemical Corporation (CUE), Foster Wheeler 
Corporation (EWC), Handy and Harman Company (HNI), Navistar In- 
ternational Corporation (NAV), and American Telephone and Telegraph 
Incorporated (T). For these six stocks, they focus only оп intraday transac- 
tion price changes since it has been wellatocumented that overnight returns 
differ substantially from intraday returns (see, for example, Amihud aud 
Mendelson [1987], Stoll and Whaley [1990], and Wood, McInish, and Ord 
[1985]). They also impose several other filters to eliminate "problem" trans- 
actions and quotes, which yielded sample sizes ranging from 3,174 trades 
for TINH to 206,794 trades for IBM. 

They also use bid and ask prices in their analysis, and since bid-ask quotes 
are reported only when they are revised, some effort is required to match 
quotes to transactions. A natural algorithm is to match each transaction 
price to the most recently reported quote prier to the transaction; however, 
Bronfman (1991) and Lee and Ready (1991) have shown that prices of 
trades that precipitate quote revisions are sometimes reported with a lay, so 
that the order of quote revision and transaction price is reversed in official 
records such as the Consolidated Tape. To address this issue, Hausman, 
Lo, and Mackinlay (1992) match transaction prices to quotes that are set 
al least five weonds [nior to the Wansaction—the evidence in Lee and Ready 
(19001) suggests that this will account for most of the missequencing. This 
is only one example of the Kind of unique challenges that transactions data 
pose, 
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To provide some intuition for this enormous dataset, we report a few 
summary statistics in Table 3.7. Our sample contains considerable price 
dispersion, with the low stock price ranging from $3.125 for NAV to $104.250 
for ИЗМ, and the high ranging from $7.875 for NAV to $129.500 for IBM. 
At $219 million, HNH has the smallest market capitalization in our sample, 
and IBM has the largest with a market value of $69.8 billion. 

The empirical analysis also requires some indicator of whether a trans- 
action was buyer-initiated or seller-initiated, otherwise the notion of price 
impact is ill-defined—a 100,000-share block-purchase has quite a different 
price impact from а 100,000-share block-sale. Obviously, this is a difficult 
task because for every trade there is always a buyer and a seller. What we 
hope to capture is which of the two parties is more anxious to consummate 
the trade and is therefore willing to pay for it by being closer to the bid price 
or the ask price. Perhaps the most obvious indicator is whether the trans- 
action occurs at the ask price or at the bid price; if it is the former then the 
transaction is most likely a "buy" and if it is the latter then thé transaction 
is most likely a "sell." Unfortunately, a large number of transactions occur 
at prices strictly within the bid-ask spread, so that this method for signing 
trades will leave the majority of trades indeterminate. 

Hausman, Lo, and MacKinlay (1992) use the well-known algorithm of 
signing a transaction as a buy if the transaction price is higher than the mean 
of the prevailing bid-ask quote (the most recent quote that is set at least five 
seconds prior to the trade); they classify it as a sell if the price is lower. If 
the price equals the mean of the prevailing bid-ask quote, they classify the 
trade as an indeterminate trade. This method yields far fewer indeterminate 
trades than classifying according to transactions at the bid or at the ask. 
Unfortunately, little is known about the relative merits of this method of 
classification versus others such as the tick test (which classifies a transaction 
as a buy, a sell, or indeterminate if its price is greater than, less than, or equal 
to the previous transaction's price, respectively), simply because it is virtually 
impossible to obtain the data necessary to evaluate these alternatives. . 

The Empirical Specification Ш 

To estimate the parameters of the ordered probit model via maximum likeli- 
hood, three specification decisions must be made: (i) the number of states 
m, (ii) the explanatory variables Ху, and (iii) the parametrization of the 
variance o}. 

In choosing m, we must balance price resolution against the practical 
constraint that too large an m will yield no observations in the extreme states 
sy and Sm. For example, if we set m to 101 and define the states s; and ло 
symmetrically to be price changes of ~50 ticks and +50 ticks, respectively, 
we would find no Y;'s among our six stocks falling into these two states. 
Using the empirical distribution of the data as a guide, Hausman, Lo, and 
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Table 3.7. Summary statistics for transactions data of six stocks, 


Variable IBM CUE FWG 


Low Price 


104.250 65.500 11.500 14.250 3.125 24.125 
High Price 129.500 108.250 17.250 18.500 7.875 30.375 
Market Value ($Billions) 69.815 2.167 0.479 0.219 0.998 28.990 
% Trades at Prices: 
> Midquote 43.81 43.19 37.13 22,53 40.80 32.37 
z Midquote 12.66 18.67 23.58 26.28 18.11 25.92 
« Midquote 43.53 38.14 39.29 51.20 41.00 41.71 
Price Change, Y, 
Mean: —0.0010 0.0016 —0.0017  —0.0028 —0.0002 0.0001 
Std. Dev: 0.7530 1.2353 0.6390 0.7492 0.6445 0.6540 
Time Between Trades, At 
Mean: 27.21 203.52 296.54 1129.37 58.36 31.00 
Std. Dev.: 34.13 282.16 416.49 1497.44 76.53 34.39 
Bid-Ask Spread, AB, 
Mean: 1.9470 3.2909 2.0830 2.4707 1.4616 1.6564 
Sid. Dev.: 1.4625 1.6203 1.1682 0.8994 0.6713 0.7930 
Sc P500 Futures Return 
Mean: 0.0000 0.0004 -0.0017 —0.0064 0.000! —0.0001 
Std. Dev.: 0.0716 0.1387 0.1475 0.1963 0.1038 0.0765 
Buy-Sell Indicator, IBS, 
Mean: 0.0028 0.0505 —0.021G —0.2867  —0.0028 -0.0933 
Std. Dev: 0.9346 0.9005 0.8739 0.8095 0.9049 0.8556 
Signed Transformed Volume 
Mean: 0.1059 0.3574  —0.0523 ~1,9543 0.0332 —0.4256 
Sid. Dev.: 6.1474 5.6643 6.2798 6.0890 6.9705 7.5846 


Median Trading Volume ($) 57,375 40,900 6,150 5,363 3,000 7,050 


Summary statistics for transaction prices and correspouding ordered probit explanatory vati- 
ables of International Business Machines Corporation (IBM, 206,794 trades), Quantum Chem- 
ical Corporation (CUE, 26,927 trades), Foster Wheeler Corporation (ЕМС, 18,109 trades), 
Handy and Harman Company (ИМИ, 3,174 trades), Navistar International Corporation (NAV, 
96,147 trades), and American Telephone and Telegraph Company (T, 180,726 trades), for the 
peridd from January 4, 1988 to December 30, 1988. 


MacKinlay (1992) set m = 9 for the larger stocks, implying extreme states 
of 44 ticks or less and +4 ticks or morc, and set m = 5 for the two smaller 
stocks, FWC and HNH, implying extreme states of —2 ticks or less and +2 
tick$ or more. 

The explanatory variables X, are selected to capture several aspects of 
transaction price changes: clock-time effects (such as the arithmetic Brow- 
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nian motion model), the effects of bideask bounce (since many transactions 
are merely inovements from the bid price to the ask price or vicc versa), the 
size of the transaction (so price impact can be determined asa function of 
the quantity traded), and the impact of "systematic" or marketwide move- 
ments on the conditional distribution ofan individual stock's price changes, 
These aspects call for the following explanatory variables: 


At: The time clapsed between transactions 4—1 and k, in seconds. 

AB: The bid-ask spread prevailing at time д.р, in ticks. 

Ye Three lags [2 = 1, 2, 3] of the dependent variable Yp. Recall that for 
m = 9, price changes less than --4 ticks are set equal to —4 ticks (state 
st), and price changes greater than +4 ticks are set equal to +4 ticks 
(state s), and similarly for m = 5. 

Vy- Three lags [2 = 1, 2, 3] of the dollar volume of the (k—Dth trans- 
action, defined as the price of the (k—Dth transaction (in dollars, not 
licks) times the number of shares traded (denominated in hundreds of 
shares); hence dollar volume is denominated in hundreds of dollars. 
To reduce the influence of outliers, if the share volume of a trade ex- 
ceeds the 99.5 percentile of the empirical distribution of share volume 
for that stock, it is set equal to the 99.5 percentile. 

SP500,.,: Three lags “ = 1, 2. 3] of five-minute continuously com- 
pounded returns of the Standard and Poor's (S&P) 500 index futures 
price, for the contract maturing in the closest month beyond the month 
in which transaction А ! occurred, where the return is computed with 
the futures price recorded one minute before the ncarest round minute 
prior to tj, and the price recorded five minutes belore this. 

IBS,_7: Three lags [/ = 1, 2, 3] of an indicator variable that takes the 
value +1 if the (k — th transaction price is greater than the average of 
the quoted bid and ask prices at time ,, the value -] if the (k—Dth 
transaction price is less than the average of the bid and ask prices at 
time n, and zero otherwise, ie., 


1 if Mo» Herm 
Юз = 40 if Per = 20%, 
E WU mS are e PEL. 


The specification of X, is then given by the following expression: 
ХВ = BA + BoY E= Hz + Pa Yas + fESP500,., + BSSP500,.. 
+ BrSP500,-5 + бау AlBS i- + Pig DS, а 
+ Bu UTI) IBS a H:) Sz] 
+ Bis T3(4 3) IBS, 4 ]. 


The variable Ал is included in Х to allow for clock-time effects on the 
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conditional mean ol . d prices are stable in transaction time rather 
than clock time, this coefficient should be zero, Lagged price changes 
are included to account for serial dependencies, and lagged returns of the 
SKP500 index futures price are included to account for marketwide effects 
on price changes. 

To measure the price impact of a trade per unit volume, the term 
TV- is included, which is dollar volume transformed according to the 
Box and Cox (1964) transformation T.C): 


E 


Tx) 


where v € (0, 1| is also a parameter to be estimated, The Box-Cox trans- 
formation allows dollar volume to enter into the conditional mean noulin- 
carly, a particularly important innovation since common intuition suggests 
that price impact may exhibit economies of scale with respect to dollar vol- 
ume; Le, although total price impact is likely to increase with volume, the 
marginal price impact probably does not. The Box-Cox transformation cap- 
tures the linear specification (v = 1) and concave specifications up to and 
including the logarithmic function (v = 0). The estimated curvature of 
this transformation will play an important role in the measurement of price 
impact, 

The transformed dollar volume variable is interacted with IBS, an 
indicator of whether the trade was buyer-initiated (IBS, 1), sellerinitiated 
IIS. — 1|, or indeterminate (IBS,=0). A positive By; would imply that 
buyer-initiated trades tend to push prices up and seller-initiated trades tend 
to drive prices down. Such a relation is predicted by several information- 
based models of trading, eg, Easley and O'Hara (1987). Moreover, the 
magnitude of Ay is the perunit volume impact on the conditional mean 
of YF, which may be readily translated into the impact on the conditional 
probabilities of observed price changes. The sign and magnitudes of Pio 
and fis measure the persistence of price impact. 

Finally, to complete the specification the conditional variance о = 
Viet Ly, u must be parametrized, To allow for clock-time effects At, is 
included, and since there is some evidence linking bid-ask spreads to the in- 
formation content and volatility of price changes (see, for example, Glosten 
[1987], Hasbrouck [ 1988, 199 1 ,. and Petersen and Umlauf [1990]), the 
lagged spread AB, , is also included. And since the parameter vectors a, 
B, and y are unidentified without additional restrictions, y? is set to опе, 
This yields the specification 


0% = Lt YAL NAR. 


ln summary, the Usate specification requires the estimation of 24 parime- 
ters: the partition boundaries wy... . oy the variance parameters yi ud у», 
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Table J. Sa. Estimates of ordered probit partition boundaries. 


Parameter IBN CUE FWC HNH NAV ‘TI 

o, -4.670 —6.213  —4.378 —4.456 7.263 -8.073 
(145.65) (-18.92) (25.24) (—5.98) (—39.28) (—56.95) 

ay —4.57  -5447 1.719 -1.80 —7.010 —7.270 
(157.75) (-18.99) (-25.96) (—5.92) (36.53) (62.40) 

аҳ -3.109  —2.795 1.679 1.993 —6.251 —5.472 
(171.59) (-19.14) (26.32) 65.97) (37.22) ee 

[7 —1.344 — -1.764 4334 4.477  -1972 -1.850 
(155.47) (—1895) (25.26) (5.85) (—34.59) (61.41) 

as 1.326 1605 —d — 1.938 1.977 
(154.91) (1881) (34.66) (62.82) 

a 31900 2774 — — 6.301 5.378 
(167.81) (19.11) (36.86) (62.43) 

a7 4200 5502  — — 7.742 7.294 
(452.17) (19.10) (31.63) (57.63) 

он 4.732 6150 — — 8.633 8.156 
(138.75) (16.94) (30.26) (56.23) 


Maximum likelihood estimates of the partition boundaries of the ordered probit model for 
transaction. price changes of International Business Machines Corporation (IBM, 206,794 
trades), Quantum Chemical Corporation (CUE, 26,927 trades), Foster Wheeler Corporation 
(FWC, 18,199 trades), Handy and Harman Company (HNH, 3,174 trades), Navistar Interna- 
tional Corporation (NAV, 96,127 trades), and American Telephone and Telegraph Company 
(T, 180,726 trades), for the period from January 4, 1988 to December 30, 1988. 


the coefficients of the explanatory variables fi, ..., Вз, and the Box Cox 
parameter v. The 5-state specification requires the estimation of only 20 
parameters. 


The Maximum Likelihood Estimates 
Tables 3.8a and 3.10b report the maximum likelihood estimates of the or- 
dered probit model for the six stocks. Table 3.8a contains the estimates of the 
boundary partitions a, and Table 3.8b contains the estimates of the "slope" 
coefficients B. Entries in each of the columns labeled with ticker symbols 
are the parameter estimates for that stock: z-tatistics, which are asymptot- 
ically standard normal under the null hypothesis that the córresponding 
coefficient is zero, are contained in parentheses below each estimate. 
Table 3.8a shows that the partition boundaries are estimated with high 
precision for all stocks and, as expected, the z-statistics are much larger for 
those stocks with many more observations. Note that the partition bound- 


| 


Table 3.8b. 


Estimates of ordered probit “slope” coefficients. 


Parameter IBM CUE FWG HNH NAV T 
yy : A1/100 0.399 0.499 0.275 0.187 0.428 0.387 
(15.57) (11.62 (11.26) (4.07) (10.01) (8.89) 
ye : AB. | 0.515 1.110 0.723 1.109 0.869 0.8608 
(71.08) (1539) (14.54) (4.48) (19.93) (38.16) 
Ву: AL/100 —-0.115 —0.014 —0.013 —0.010  —0.032  -0.127 
(-11.42) (-2.14) (-3.50) (—2.69)  (—3.82) (9.51) 
Bs : Ya -1.012 -0.333  -—1.325 —0.740  -2.600 | —2.346 
(135.57) (—13.46) (—24.49) (—5.18) (—36.32) (62.74) 
Вз: Y —0.532 — —0.000 —0.638 —0.406 — —1.521 ~42 
(—85.00)  (—0.03) (—16G.45) (—4.06) (—34.13) (-506.52) 
Bic Yes —0.211 —0.020 —0.223 —0.116 —0.536 — —0.501 
(47.15) (—1.42) (—9.23) (—1.84) (-31.63) (—47.91) 
Въ : SP500_, 1.120 2.292 1.359 0,472 0.419 0.625 
(54.22) (13.54) (43.49) (1.36) (8.05) (17.12) 
Bo : SP500., —0.257 1.373 0.302 0.448 0.150 0.177 
(-12.06) (9.61) (2.03) (1.20) (2.87) (4.96) 
f; : SP500_5 0.006 0.677 0.204 0.388 0.159 0.141 
(0.26) (5.15) (1.07) (1.13) (3.02) (3.93) 
Bx : IBS; — 1.137  Á-1.915 —0.791 —0.803 —0.501 — —0.740 
(63.64) (15.36) (7.81) (2.89) (17.38) (23.01) 
f. J: 18-2 0.369 —0.279 —0.184 —0.184 —0.370 0.330 
1 (21.55) (3.37) (—3.66) (—-0.75) (—15.38) (—18.11) 
| : IBS_, —0.174 0.079 —0.177 -—9.022 -0.301 —0.2u0 
| (—10.29) (0.98) (3.64) (—0.17) (—15.37) (—19,78) 
в) : J. (BS. 0.122 0.217 0.050 0.038 0.013 0.032 
} (47.37) (12.97) (1.80) (0.55) ° (2.50) (4.51) 
Bis J (Vz) 138.2 0.047 0.036 0.015 0.036 0.011 0.014 
(18.57) (2.82) (1.54) (0.55) (2.54) (4.22) 
" : J (Vz) lBS.2 0.019 0.007 0.015  —0.006 0.005 0.005 
(7.70) (0.59) (1.56) (-0.34) (2.09) (3.02) 


Maximum likelihood estimates of the "slope" coetficieuts of the ordered probit model lor tans- 
action price changes of International Business Machines Corporation (IBM, 206,704 trades), 
Quantum Chemical Corporation (CUE, 26.927 trades), Foster Wheeler Corporation (ЕМС, 
15,199 trades), Handy and Harman Company (HNH, 3,174 trades), Navistar International 
Corporation (NAV, 96,127 trades), and American Telephone and Telegraph Company (T, 
180,726 trades), for the period trom January 4, 1988 to December 30, 1988, 
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aries are not evenly spaced, e. g., lg -= 1.765, whereas ја -G = 2.670 
(it can be shown that these two values are statistically different). One im- 
plication is that the eighths-barrier model of discrete prices, e.g., that of 
Marsh and Rosenfeld (1986), is not consistent with these transactions data. 
Another implication is that the estimated conditional probabilities of price 
changes need not look normal, but may (and do) display a clustering phe- 
nomenon similar to the clustering of the unconditional distribution of price 
changes on even cighths, 

Table 3.8b shows that the conditional means of the 's for all six stocks 
are only marginally affected by At. Moreover, the z-statistics are minuscule, 
especially in light of the large sample sizes. However, At docs enter into 
the ae expression significantly—in fact, since all the parameters for оў are 
significant, homoskedasticity may be rejected—and hence clock-time is im- 
portant for the conditional variances, but not for the conditional means of 
JF. Note that this does not necessarily imply the same for the conditional 
distribution of the Vis, which is nonlinearly related to the conditional distri- 
bution of the Yrs. For example, the conditional mean of the Y,'s may well 
depend on the conditional variance of the Үз”, so that clock-time can still 
affect the conditional mean of observed price changes even though it docs 
not affect the conditional mean of Yf. 


Order How, Discreteness, and Price Impact 

More striking is the significance and sign of the lagged price change coetli- 
cients Ê», Bs, and Ba, which are negalive for all stocks, implying a tendency 
towards price reversals. For example, if the past three price changes were 
cach one tick, the conditional mean of Y? changes by Bot aH However, 
if the sequence of price changes was 1/-1/1, then the effect on the condi- 
tional mean is Ё. BS, a quantity closer to zero for cach of the security's 
parameter estimates. 

Note that these coefficients measure reversal tendencies beyond that 
induced by the presence of a constant bid-ask spread as in Roll (1984). 
The effect of bid-ask bounce on the conditional mean should be captured 
by the indicator variables IBS, 1, IBS, 2, and IBS, . In the absence of 
all other information (such as market movements or past price changes), 
these variables pick up any price effects that buys and sells might have on 
the conditional mean. As expected, the estimated coefficients are generally 
negative, indicating the presence of reversals due to movements from bid 
to ask or ask to bid prices. Hausman, Lo, and MacKinlay (1992) compare 
their magnitudes formally and conclude that the conditional mean of price 
changes is path-dependent with respect to past price changes—the sequence of 
price changes or order flow matters, 

Using these parameter estimates, Hausman, Lo, and MacKinlay (1992) 
arc also able io address the second two questions they put forward. Price 


— — 
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прасе effect of a trade on the market price—can be quantified with 
relatively high precision, it does increase with trade size although not lin- 
early, and it differs from stock to stock. The more liquid stocks such as IBM 
tend to have relatively (lat priceimpact functions, whereas less liquid stocks 
such as LINED are more sensitive to trade size (see, in particular, Hausman, 
Lo, and MacRinlay (1992, Figure 1|). 

Also, discreteness does matter, in the sense that the conditional distrihu— 
поп of price changes implied by the ordered probit specification can capture 
certain nonlinearities—price-clustering on even cighths versus odd eighths, 
for example—that other techniques such as ordinary least squares cannot. 

While it is still too carly to say whether the ordered probit mode! will 
have broader applications in market microstructure studies, it is currently 
the only model that can capture discreteness, irregular trade intervals, and 
the effects of economic variables on transaction prices in a relatively parsi- 
montous fashion. 


3.5 Conclusion 


There are many outstanding economic aud econometric issues that can now 
be resolved in the market microstructure literature thanks to the plethora of 
newly available transactions databases. In this chapter we have touched on 
only three of the issues that are part of the burgeoning market microstruc- 
ture literature: nonsynchronous trading, the bid-ask spread, and modeling 
transactions data. However, the combination of transactions databases and 
ever-increasing computing power is sure to create many new directions of 
arch, For example, the measurement and control of trading costs has 
been of primary concern to large institutional investors, but there has Leen 
relatively little academic vesearch devoted to this important topic because 
the necessary data were unavailable until recently. Similarly, measures of 
market transparency, liquidity, and competitiveness all figure prominently 
in recent theoretical models of security prices, but it has been virtually im- 
possible to implement any of these theories until recently because of a lack 
of data. The experimental markets literature has also contributed many 
insights into market microstructure issues but its enormous potential is only 


beginning to be realized. Given the growing interest in market microstruc- 
ture by academics, investment professionals and, most recently, polievinak- 
eis involved in rewriting securities markets regulations, the next few years 
are sure to be an extremely exciting and fertile period for this area. 


Problems—Chapter 3 
31 Derive the mean, variince, autocovariance, and autocorrelation fanc- 


tions (Z. L93. 1.12) of the observed returns process (75) for the nontrading 
model ot Section . I. Hint Use the representation (3.1.4). 
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3.2 Under the nontrading process defined by (3.1.2)~(3.1.3), and assum- 
ing that virtual returns have a linear one-factor structure (3.1.1), show how 
nontrading g affects the estimated beta of a typical security. Recall that a secu- 
rity's beta is defined as the slope coefficient of a regression of the security’ 5 
returns on the return of the market portfolio. 


3.3 Suppose that the trading process {8;,} defined in (3.1.2) were noi HD, 
but followed a two-state Markov chain instead, with transition probabilities 
given by Sit | 
0 1 : 
0 Tj 1 Ni 


Sit s G4. D 


1 
1Т\1-л; x, 


3.3.1 Derive the unconditional mean, variance, first-order autocovari- 
ance, and steady-state distribution of ôi as functions of x; and xj, — ' 

| 
3.3.2 Calculate the mean, variance, and autocorrelation function of the 
observed returns process rf under (3.5.1). How does serial correlation in 
8а affect the moments of observed return? | 


3.3.3 Using daily returns for any individual security, estimate the param- 
eters л; and л; assuming that the virtual returns process is ITD. Are the 
estimates empirically plausible? 


3.4 Extend the Roll (1984) model to allow for a serially correlated order- 
type indicator variable. In particular, let J, be a two-state Markov with —1 
and 1 as the two states, and derive expressions for the moments of AP, in 
terms of s and the transition probabilities of /,. How do these results differ 
from the HD case? How would you reinterpret Roll's (1984) findings in 
light of this more general model of bid-ask bounce? 


3.5 How does price discreteness affect the sampling properties of the mean, 
standard deviation, and first-order autocorrelation estimators, if at all? Hint: 
Simulate continuous-state prices with various starting price levels, round 
to the nearest eighth, calculate the statistics of interest, and tabulate the 
relevant sampling distributions. 


3.6 The following questions refer to an extract of the NYSE's TAQ Database 
which consists of all transactions for IBM stock that occurred on January 4th 
and 5th, 1988 (2,748 trades). 


3.6.1 Construct a histogram for IBM's stock price. Do you see any ev- 
idence of price clustering? Construct a histogram for IBM's stock price 
changes. 15 there any price-change clustering? Construct the following 
two histograms and compare and contrast: the histogram of price changes 
conditional on prices falling on an even eighth, and the histogram of price 
changes conditional on prices falling on an odd eighth. Using these his- 
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tograms, comment on the importance or unimportance of discrete prices 
for statistical inference. 


3.6.2. What is the average time between trades for IBM? Construct a 
95% confidence interval about this average. Using these quantities and 
the central limit theorem, what is the probability that IBM does not trade 
in any given one-minute interval? Divide the trading day into one-minute 
intervals, and estimate directly the unconditional and conditional probabil- 
ities of nontrading, where the conditional probabilities are conditioned 
on whether a trade occurred during the previous minute (hint think 
about Markov chains). Is the nontrading process independent? 


3.6.3 Plot price and volume on the same graph, with time-of«lay as the 
horizontal axis. Are there any discernible patterns? Propose and perform 
statistical tests of such patterns and other patterns that might not be visible 
to the naked eye but are motivated by economic considerations; e.g., block 
trades are followed by larger price changes than nonblock trades, etc." 


3.6.4 Devise and estimate а model that measures price impact, i.c., the 
actual cost of trading n shares of IBM. Feel free to use any statistical 
methods at your disposal—there is no single right answer (in particular, 
ordered probit is not necessarily the best way to do this). Think carefully 
| about the underlying economic motivation for measuring price impact. 


3.7 The following questions refer to an extract of the NYSE's TAQ Database 
which consists of bid-ask quote revisions and depths for IBM stock that were 
displayed during January 4th and 5th, 1988 (1,327 quote revisions). 


3.7.1 Construct a histogram for IBM's bid-ask spread. Can you conclude 

from this that the dynamics of the bid-ask spread are unimportant? Why 

lor why not? You may wish to construct various conditional histograms to 
roperly answer this question. 


7.2 Are there any discernible relations between revisions in the bid-ask 
uotes and transactions? That is, do revisions in bid-ask quotes “cause” 
rades to occur, or do trades motivate revisions in the quotes? Propose 
nd estimate a model to answer this question. 


7.3 How are changes in the bid and ask prices related to volume, if at 
all? For example, do quote revisions cause trades to occur, or do trades 
motivate revisions in the quotes? Propose and estimate a model to answer 
this question, 


3.7.4 Consider an asset allocation rule in which an investor invests fully 
in stocks until experiencing a sequence of three consecutive declines, after 


The NYSE defines a block trade as any trade consisting of 10,000 shares or more. 
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which he will switch completely into bonds until experiencing a sequence 
Ol six consecutive advances. Implement this rule for an initial investment 
of $100,000 with the transactions data, but do it two ways: (1) usc the 
average of the bid-ask spread for purchases or sales; (2) use the ask price 
for purchases and the bid price for sales. How much do you have left at 
the end of two days of trading? You may assume а zero riskfree rate for 
this exercise. 


Event-Study Analysis 


ECONOMISTS ARE FREQUENTLY ASKED to measure the effect of an economic 
event on the value of a firm. On the surface this seems like a difficult 
task, but a measure can be constructed easily using financial market data 
in an event study. The usefulness of such a study comes from the fact 
that, given rationality in the marketplace, the effect of an event will be 
reflected immediately in asset prices. Thus the event's economic impact 
can be measured using asset prices observed over a relatively short time 
period. In contrast, direct ineasures may require many months or even 
years of observation. 

The general applicability of the event-study methodology has led to 
its wide use. In the academic accounting and finance field, event-study 
methodology has been applied to a variety of firm-specific and economy- 
wide events. Some examples include mergers and acquisitions, earnings an- 
nouncements, issues of new debt or equity, and announcements of macroe- 
conomic variables such as the trade deficit." However, applications in other 
fields are also abundant. For example, event studies are used in the field of 
law and economics to measure the impact on the value of a firm of a change 
in the regulatory environment,” and in legal-liability cases event studies are 
used to assess damages.” In most applications, the focus is the effect iof an 
event on the price of a particular class of securities of the firm, most often 
common equity. In this chapter the methodology will be discussed in terms 
of common stock applications. However, the methodology can be applied 
to debt securities with little modification. 

Event studies have a long history. Perhaps the first published study is 
Dolley (1933). Dolley examined the price effects of stock splits, studying 
nominal price changes at the time of the split. Using a sample of 95 splits 


‘We will further discuss the first three examples later in the chapter. McQueen and Roley 
(1903) provide an illustration using macroeconomic news announcements. i 

“See Schwert (1981). | 

“See Mitchell and Netter (1994). 
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from 1921 to 1931, he found that the price increased in 57 of the cases and 
the price declined in only 26 instances. There was no effect in the other 12 
cases, Over the decades from the early 1930s until the late 1960s the level of 
sophistication of event studies increased. Myers and Bakay (1948), Barker 
(1956, 1957, 1958), and Ashley (1962) are examples of studies during this 
time period. The improvements include removing general stock market 
price movements and separating out confounding events. In the late 1960s 
seminal studies by Ball and Brown (1968) and Fama, Fisher, Jensen, and 
Roll (1969) introduced tlie methodology that is essentially still in use today. 

all and Brown considered the information content of carnings, and Fama, 
Kisher, Jensen, and Roll studied the effects of stock splits after removing the 
effects of simultaueous dividend increases. 

| In the years since these pionecring studies, several modifications of the 
basic methodology have been suggested. Fliese modifications handle com- 
plications arising from violations of the statistical assumptions used in the 
carly work, and they can accommodate more specific hypotheses. Brown 
and Warner (1980, 1985) are useful papers that discuss the practical im- 
portance of many of these modifications. The 1980 paper considers imple- 
mentation issucs for data sampled at a monthly interval and the 1985 paper 
ddals with issues for daily data. 

This chapter explains the econometric methodology of event studies. 
Settion 4.1 briefly outlines tlie procedure for conducting an event study. 
Section 4.2 sets up an illustrative example of an event study. Central to 
any event study is the measurement of the abnormal return. Section 4.3 
details the first step—measuring the normal performance—and Section 4.4 
ae with the necessary tools for calculating the abnormal return, mak- 
ing statistical inferences about these returns, and aggregating over many 
event observations. In Sections 4.3 and 4.4 the discussion maintains the 
null hypothesis that the event has no impact on the distribution of returns. 
Section 4.5 discusses modifying the null hypothesis to focus only on the 
mean of the return distribution. Section 4.6 analyzes of the power of an 
event study. Section 4.7 presents a nonparametric approach to event stud- 
ies which eliminates the need for parametric structure. In some cases theory 
provides hypotheses concerning the relation between the magnitude of the 
event abnormal return and firm characteristics. In Section 4.8 we consider 
cross-sectional regression models which are useful to investigate such hy- 
potheses. Section 4.9 considers some further issues in event-study design 
aud Section 4.10 concludes. 


4.1 Outline of an Event Study 


At the outset it is useful to give a brief oudine of the structure of an event 
study. While there is no unique structure, the analysis сап be viewed 
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as having seven steps: 


1. 


vent definition. The initial task of conducting an event study is to de- 
fine the event of interest and identify the period over which the security 
prices of the firms involved in this event will be examined—the event 
window. For example, if one is looking at the information content of 
an earnings announcement with daily data, the event will be the earn- 
ings announcement and the event window might be the one day of the 
announcement, In practice, the event window is often expanded to 
two days, the day of the announcement aud the day after the announce- 
ment. This is done to capture the price effects of announcements which 
occur after the stock market closes on the announcement day. The pe- 
riod prior to or after the event may also be of interest and included 
separately in the analysis. For example, in the carnings-announcement 
case, the market may acquire information about the earnings prior to 
the actual announcement and one can investigate this possibility by 
examining pre-event returns, 

Selection criteria, After identifying the event of interest, it is necessary 
to determine the selection criteria for the inclusion of a given firm in 
the study. The criteria may involve restrictions imposed by data avail- 
ability such as listing on the NYSE or AMEX or may involve restrictions 
such as membership in a specific industry. At this stage it is useful to 
summarize some characteristics of the data sample (е.р., firm market 
capitalization, industry representation, distribution of events through 
time) and note any potential biases which may have been introduced 
through the sample selection. 

Normal and abnormal returns. To appraise the event's impact we require 
a measure of the abnormal return. The abnormal return is the actual 
ex post return of the security over the event window minus the normal 
return of the firm over the event window. The normal return is defined 
as the return that would be expected if the event did not take place. For 
cach firm i and event date t we have 


ey = Ra~ ELN | XJ. (4.1.1) 


where e, %, and ERa) are the abnormal, actual, and normal returns, 
respectively, for time period 4. X, is the conditioning information for 
the normal performance model. "There are two common choices for 
modeling the normal return—the constant-mean-relurn model where X, 
id a constant, and the market model where X, is the market return. The 
constant-mean-return model, as the name implies, assumes that the 
mean return of a given security is constant through time. The market 
model assumes a stable linear relation between the market return and 
the security return, 
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A. Tit procedure, Once a normal performance model has been se- 
lected, the parameters of the model must be estimated using a subset 
of the data known as the estimation window, The most common choice, 
when feasible, is to use the period prior to the event window for the esti- 
mation window, For example, in an event study using daily data and the 
market model, the marketanodel parameters could be estimated over 
the 120 days prior to the event. Generally the event period itself is not 
included in the estimation period to prevent the event from influencing 
the normal performance model parameter estimates. 

‘Testing procedure. With the parameter estimates for the normal perfor- 

mance model, the abnormal returns сап be calculated. Next, we need 

to design the testing framework for the abnormal returns, Important 
considerations are defining the null hypothesis and determining the 
techniques for aggregating the abnormal returns of individual firms, 

6. Empirical result. The presentation of the empirical results follows the 
formulation of the econometrie design. In addition to presenting the 
basic empirical results, the presentation of diagnostics can be fruitful. 
Occasionally, especially in studies with a limited number of event obser- 
vations, the empirical results can be heavily influenced by one or two 
firms. Knowledge of this is important for gauging the importance of 
the results. 

7. Interpretation and conclusions. Ideally the empirical results will lead to 
insights about the mechanisms by which the event affects security prices. 
Additional analysis may be included to distinguish between competing 
explanations. 


c 


4.2 An Example of an Event Study 


The Financial Accounting Standards Board (FASB) and the Securities Èx- 
change Commission strive to set reporting regulations so that financial siate- 
ments and related information releases are informative about the value of 
the firm. In setting standards, the information content of the financial dis- 
closures is of interest, Event studies provide an ideal tool for examining the 
information content of the disclosures. 

In this section we describe an example selected to illustrate the evenit- 
study methodology, One particular type of disclosure—quarterly earnings 
aunouncementsis considered, We investigate the information content of 
quarterly earnings announcements for the thirty firms in the Dow Jones 
Industrial Index over the five-vear period from January 1989 to December 
1003. These announcements correspond to the quarterly earnings for the 
last quarter ol 1988 through the third quarter of 1993. The five усш of 
data for thirty firmis provide a total sample of 600 announcements. For 
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each firm and quarter, three pieces of information are compiled: the date 
of the announcement, the actual announced earnings, and a measure of 
the expected earnings. The source of the date of the announcement is 
Datastream, and the source of the actual earnings is Compustat. 

If earnings announcements convey information to investors, one whuld 
expect the announcement impact on the market's valuation of the firm's 
equity to depend on the magnitude of the unexpected component of the 
announcement. Thus a measure of the deviation of the actual announced 
earnings from the market's prior expectation is required. We use the mean 
quarterly earnings forecast from the Institutional Brokers Estimate System 
(I/B/E/S) to proxy for the markets expectation of earnings. 1/B/E/S cop- 
piles forecasts from analysts for a large number of companies and reports 
summary statistics each month. The mean forecast is taken from the last 
month of the quarter. For example, the mean third-quarter forecast from 
September 1990 is used as the measure of expected earnings for the third 
quarter of 1990, 

In order to examine the impact of the earnings announcement on the 
value of the firm’s equity, we assign each announcement to one of three 
categories: good news, no news, or bad news. We categorize each an- 
nouncement using the deviation of the actual earnings from the expected 
earnings. If the actual exceeds expected by more than 2.5% the announce- 
ment is designated as good news, and if the actual is more than 2.5% less 
than expected the announcement is designated as bad news. Those an- 
nouncements where the actual earnings is in the 5% range centered about 
the expected earnings are designated as no news. Of the 600 announce- 
ments, 189 are good news, 173 are no news, and the remaining 238 are bad 
news. 

With the announcements categorized, the next step is to specify the 
sampling interval, event window, and estimation window that will be used 
to analyze the behavior of firms’ equity returns. For this example we set the 
sampling interval to one day; thus daily stock returns are used. We choose a 
41-day event window, comprised of 20 pre-event days, the event day, and 20 
post-event days. For each announcement we use the 250-trading-day period 
prior to the event window as the estimation window. After we present the 
methodology of an event study, we use this example as an illustration. 


i 

4.3 Models for Measuring Normal Performance | 

| 
A number of approaches are available to calculate the normal return of a 
given security. The approaches can be loosely grouped into two categories. 
statistical and economic. Models in the first category follow from statistical 
assumptions concerning the behavior of asset returns and do not depend on 
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any economic arguments. In contrast, models in the second category rely 
on assumptions concerning investors' behavior and arc not based solely on 
PERI assumptions. It should, however, be noted that to use economic 
models in practice it is necessary to add statistical assumptions. Thus the 
potehtial advantage of economic models is not the absence of statistical 
assumptions, but the opportunity to calculate more precise measures of the 
normal return using economic restrictions. 

or the statistical models, it is conventional to assume that asset re- 
turn} are jointly multivariate normal and independently and identically dis- 
tributed through time. Formally, we have: 


(Al) Let R. be an (Nx I) vector of asset returns for calendar time period t. R, is 


independently multivariate normally distributed with mean p and covariance matrix 
f1 fot all t. 


This! distributional assumption is sufficient for che constant-mean-return 
model and the market model to be correctly specified and permits the de- 
velopment of exact finite-sample distributional results for the estimators 
and statistics. Inferences using the normal return models are robust to 
deviations from the assumption. Further, we can explicitly accommodate 
deviations using a generalized method of moments framework. 


4.3.1 Constani-Mean-Return Model 


Let и, the ith clement of yz, be the mean return for asset i. Then the 
constant-mean-return model is 


Ка = wit $u (4.3.1) 
Е и] = 0 — Varl&u] = of, 


where Ri, the ith element of R, is the period-/ return on security i, &, is the 
disturbance term, and оў is the (i, i) element of Q. 

Although the constantmean-return model is perhaps the simplest 
model, Brown and Warner (1980, 1985) find it often yields results simi- 
lar to those of more sophisticated modcls. This lack of sensitivity to the 
model choice сап be attributed to the fact that the variance of the abnormal 
return is frequently not reduced much by choosing a more sophisticated 
model. When using daily data the model is typically applied to nominal 
returns. With monthly data the model can be applied to rea! returns or 
excess returns (the return in excess of the nominal riskfree return generally 
measured using the US Treasury bill) as well as nominal returns. 
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4.3.2 Market Model 


The market model is a statistical model which relates the return of any 
given security to the return of the market portfolio. The model's linear 
specification follows from the assumed joint normality of asset returns." 
For any security i we have 


Ry = a, + В.К, e (4.3.2) 
Kle,] = 0 Var[(e,] = oj, 


where A, and Ry, are the period-¢ returns on security i and the market 
portfolio, respectively, and ej, is the zero mean disturbance term. d, Bj, 
and a? are the parameters of the market model. In applications a broad- 
based stock index is used for the market portfolio, with the S&P500 index, 
the CRSP valuc-weighted index, and the CRSP equalweighted index being 
popular choices. 

The market model represents a potential improvement over the con- 
stantmean-return model. By removing the portion of the return that is 
related to variation in the markets return, the variance of the abnormal 
return is reduced. This can lead to increased ability to detect event effects. 
The benefit from using the market model will depend upon the R? of the 
marketmodel regression. The higher the А, the greater is the variance re- 
duction of the abnormal return, and the larger is the gain. See Section 4.4.4 
for more discussion of this point. 


4.3.3 Other Statistical Models 


A number of other statistical models have been proposed for modeling 
the normal return. A general type of statistical model is the factor model. 
Factor models potentially provide the benefit of reducing the variance of 
the abnormal veturn by explaining more of the variation in the normal 
return, Typically the factors are portfolios of traded securities. The market 
model is an example of a one-factor model, but in a multifactor model onc 
might include industry indexes in addition to the market. Sharpe (1970) 
and Sharpe, Alexander, and Bailey (1995) discuss index models with factors 
based on industry classification. Another variant of a factor model is a 
procedure which calculates the abnormal return by taking the difference 
between the actual return and a portfolio of firms of similar size, where size 
is measured by market value of equity. In this approach typically ten size 
groups are considered and the loading on the size portfolios is restricted 


‘The specification actually requires the asset weights in the market portfolio to remain 
constant. However, changes over time in the market portfolio weights are small enough that 
they have little effect on empirical work. 
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to unity, This procedure implicitly assumes that expected return is directly 
related to the market value of equity. 

In practice the gains from employing multifactor models for event stud- 
ies are limited. The reason for this is that the marginal explanatory power of 
additional factors beyond the market factor is small, and hence there is hte 
reduction in the variance of the abnormal return. The variance reduction 
will typically be greatest in cases where the sample firms have a common 
characteristic, for example they are all members of one industry or they are 
all firms concentrated in one market capitalization group. In these cases 
the use of a multifactor model warrants consideration. 

Sometimes limited data availability may dictate the use of a restricted 
model such as the marketadjusted-return madel. For some events it is not feasi- 
ble to have a pre-event estimation period for the normal model parameters, 
and a marketadjusted abnormal return is used. The marketadjusted-return 
model can be viewed as a restricted market model with о; constrained to be 
O and fl, constrained to be Û. Since the model coefficients are prespecified, 
an estimation period is not required to obtain parameter estimates. This 
model is often used to study the underpricing of initial public offerings. 
A general recommendation is to use such restricted models only as a last 
resort, and to keep in mind that biases may arise if the restrictions are false. 


4.3.4 Economic Models 


Economic models restrict the parameters of statistical models to provide 
more constrained normal return models. Two common economic models 
which provide restrictions are the Capital Asset Pricing Model (CAPM) and 
exact versions of the Arbitrage Pricing ‘Theory (APT). The CAPM, due to 
Sharpe (1964) and Lintner (1965b), is an equilibrium theory where the 
expected return of a given asset is a linear function of its covariance with 
the return of the market portfolio. The APT, dne to Ross (1976), is an asset 
pricing theory where in the absence of asymptotic arbitrage the expected 
return of a given asset is determined by its covariances with multiple factors. 
Chapters 5 and 6 provide extensive treatments of these two theories. 

The Capital Asset Pricing Model was commonly used in event studies 
during the 19705, During the last ten years, however, deviations from the 
CAPM have been discovered, and this casts doubt on the validity of the 
restrictions imposed by the CAPM on the market model. Since these re- 
strictions can be relaxed at little cost by using the market model, the use of 
the CAPM in event studies has almost ceased. 

Some studies have used multifactor normal performance models mo- 
tivated by the Arbitrage Pricing Theory. The APT can be made to fit the 


See Ritter (190) doi an example. 
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Figure 4.1. Time Line for an Event Study 


cross-section of mean returns, as shown by Fama and French (1996a) and 
others, so a properly chosen APT model does not impose false restrictions 
on meau returns. On the other hand the use of the APT complicates the 
implementation of an event study and has little practical advantage relative 
to the unrestricted market model. See, for example, Brown and Weinstein 
(1985). There seems to be no good reason to use an economic model rather 
than a statistical model in an event study. 


4.4 Measuring and Analyzing Abnormal Returns 


In this section we consider the problem of measuring and analyzing abnor- 
mal returns. We use the market model as the normal performance return 
model, but the analysis is virtually identical for the constant-mean-return 
model. 

We first define some notation. We index returns in event time using 
r. Defining r = 0 as the event date, r = Tj + 1 to т = T» represents 
the event window, and t = To + l to r = Тү constitutes the estimation 
window. Let Ly = Ti — Jo and Ly = T; — Ti be the length of the estimation 
window and the event window, respectively. If the event being considered 
is an announcement on a given date then T; = T; + l and I5 = 1. If 
applicable, the post-event window will be from t = Tz + 1 to t = Ts and its ! 
length is [4 = Ту — T». The timing sequence is illustrated on the time line 
in Figure 4.1. 

We interpret the abnormal return over the event window as a measure | 
of the impact of the event on the value of the firm (or its equity). Thus, the | 
methodology implicitly assumes that the event is exogenous with respect tio 
the change in market value of the security. In other words, the revision in 
value of the firm is caused by the event. In most cases this methodology is | 
appropriate, but there are exceptions. There are examples where an event | 
is triggered by the change in the market value of a security, in which case 
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the event is endogenous. For these cases, the usual interpretation will be 
incorrect. 
It is typical for the estimation window and the event window not to over- 
ap. This design provides estimators for the parameters of the normal return 
odei which are not influenced by the event-related returns. Including the 
vent window in the estimation of the normal model parameters could lead 
o the event returns having a large influence on the normal return mea- 
ure. In this situation both the normal returns and the abnormal returns 
ould reflect the impact of the event. This would be problematic since the 
methodology is built around the assumption that the event impact is cap- 
tured by the abnormal returns. In Section 4.5 we consider expanding the 
null hypothesis to accommodate changes in the risk of a firm around the 
event. In this case an estimation framework which uses the event window 
returns will be required. 


. J. I Estimation of the Market Model 


Recall that the market model for security i and observation t in event time 
is 


li. = a, + B, m + єн. (4.4.1) 


The estimation-window observations can be expressed as a regression sys- 
tem, 


R, = X0, + €, (4.4.2) 


where R; = [Rin+i Hn] is an (Ly x1) vector of estimation window re- 
turns, X; = [t Ry] is an (Lı x2) matrix with a vector of ones in the first col- 
umn and the vector of market return observations Ra IH. Ni] 
in the second column, and 8; = (o, B,]' is the (2x1) parameter vector. X has 
a subscript because the estimation window may have timing that is specific 
to firm i. Under gencral conditions ordinary least squares (OLS) isa consis- 
tent estimation procedure for the market-model parameters. Further, given 
the assumptions of Scction 4.3, OLS is efficient. The OLS estimators of the 


market-model parameters using an estimation window of Ly observations 
are 


0, = (XX, XR; (4.4.3) 
29 bt 
a = Lou (4.4.4) 
١ ¢ = N. X86, (4.4.5) 
| ` Var8,] = XX lo. (4.4.6) 


| 

\ " i iid 
We next show how to use these OLS estimators to measure the statistical 

i 

i 
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properties of abnormal returns. First we consider the abnormal return 
properties of a given security and then we aggregate across securities. 


4.4.2 Statistical Properties of Abnormal Returns 
Given the market-model parameter estimates, we can measure and analyze 
the abnormal returns. Let e, be the (5x1) sample vector of abnormal 
returns for firm 2 from the event window, Ty ＋ d to 72. Then using the 
market model to measure the normal return and the OLS estimators from 
(11.3), we have for the abnormal return vector: ў 


& = R. d. BR, 
= R- X/ 0, (4.4.7) 


where R = [Riny Rip) is an (Lax 1) vector of eventavindow rcturns, 
Xf = [i R/] is an (/45x2) matrix with a vector of ones in the first column 
aad the vector of market return observatious R., = (Rana °° Ry nJ in the 
second columm, and 6; = (a, LAU is the (2x1) parameter vector estimate. 
Conditional on tlie market return over the event window, the abnormal re- 
turns will be jointly normally distributed with a zero conditional mean and 
conditional covariance matrix V, as shown in (4.4.8) and. (4.4.9), respec- 
tively, 


Ele? | Xi] = EIR - x- G, | xt] 
= EKR} - x- - Xi(0, - 0) | Xt 
= 0. (4.4.8) 
V Stee | x] 


= kllef - X78, Oer - X0, - 00) | x7] 


= E[e ef’ - c, 0X7 — XO, - e. 
+ XY, - 0,0 — OX" x- 
= log XXX) (Хо? (4.4.9) 


Lis the (Lax la) identity matrix, 

From (44.8) we see that the abnormal return vector, with an expecta- 
tion of zero, is unbiased, The covariance matrix of the abnormal return 
vector from (4.4.9) has two parts. The first term in the sum is the variance 
due to the future disturbances and the second term is the additional vari- 
ance due to the sampling error in 0,. This sampling error, which is common 
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for all the elements of the abnormal return vector, will lead to serial corre- 
lation of the abnormal returns despite the fact that the true disturbances 
are independent through time. As the length of the estimation window 74 
becomes large, the second term will approach zero as the sampling error of 
the parameters vanishes, and the abnormal returns across time periods will 
become independent asymptotically, 

Under the null hypothesis, Ho, that the given event has no impact on 
the mean or variance of returns, we can use (4.4.8) and (4.4.9) and the joint 
normality of the abnormal returns to draw inferences. Under Ha, for the 
vector of eventavindow sample abnormal returns we have 


è ~ VO. v.). (4.4.10) 


Equation (4.4.10) gives us the distribution for any single abnormal return 


observation, We next build on this result and consider the aggregation of 
abnormal returns. 


1.1.7 Aggregation of Abnormal Returns 


The abnormal return observations must be aggregated in order to draw 
overall inferences for the event of interest. The aggregation is along two 
dimensions—through time апа across securities. We will first consider ag- 
gregation through time for an individual security and then will consider 
aggregation both across securities and through time, 

We introduce the cumulative abnormal return to accommodate muhi- 
ple sampling intervals within the event window. Define CAR (ri, то) as the 
cumulative abnormal return for security i from rj to ту where 71 <от < 
T) € T». Lety be an (Lyx }) vector with ones in positions n — 7) to To — Ti 
and zeroes elsewhere. Then we have 


САК (ri. re) = . (44.11), 
Var CAR, (ry. re)] = o? 12) = Y Viv. (4.4.12) 


H follows from (1.41.10) that under Fo, 
CART) ~ AN (0,03 (0. то). (4.4.13) 


We can construct a test of Hy for security ё from (4.4.13) using the standard- 
ized cumulative abnormal return, 


E CAR, (ту. 72) 
SCAR (ri. Ta) == ——————, (L4 14) 


Gn, ta) 


where ê (r, ra) is caleulited with 82 from (4.4.4) substituted for о, Under 
the null hypothesis the distribution of SOAR, тә) is Student (with Ly —2 
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degrees of freedom. From the properties of the Ue 1 distribution, 
the expectation of SCAR; тр, то) is O and the variance is GA zy For a large 


estimation window (for example, Ii > 30), the йын Бой. of SCAR,(r, 12) 
will be well approximated by the standard normal. 

The above result applies to a sample oſ one event and must be extended 
for the usual case where a sample of many event observations is aggregated. 
To aggregate across securities and through time, we assume that there is 
not any correlation across the abnormal returns of different securities. This 
will generally be the case if there is not any clustering, that is, there is not 
any overlap in the event windows of the included securities. The absence of 
any overlap and the maintained distributional assumptions imply that the 
abnormal returns and the cumulative abnormal returns will be independent 
across securities. Inferences with clustering will be discussed later. | 

The individual securities’ abnormal returns can be averaged using €7 
from (4.4.7). Given a sample of N events, defining ё* as the sample average 
of the N abnormal return vectors, we have : 

| 


Dom | 
_ уе (4.4.15) 

N 2 | 

i - | 
Var[e'] = V = I Lv. (4.4.16) 

ixl Р 


We сап aggregate the elements of this average abnormal returns vector 
through time using the same approach as we did for an individual security's 
vector. Define CAR(t}, rz) as the cumulative average abnormal return сүп 
T; to то where Тү < ту € t2 € T» and y again represents ап (La х1) vectOr 
with ones in positions ту — Ту to tg — Тү and zeroes elsewhere. For the 
cumulative average abnormal return we have 


CAR(n,r)) = yE (4.4.17) 


Var[CAR(r, r)] = n ta) = V. (4.4.18) 
Equivalently, to obtain САК(т|, tz), we can aggregate using the sample 


cumulative abnormal return for cach security i. For N events we have 


N 
САК(п, r) = р УСАК, т) (4.4.19) 
ж! 


М 
Ма [САК r)] = о, r) = уи om. то). (4.4.20) 
i=l 
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In (4.4.16), (4.4.18), and (4.4.20) we use the assumption that the event 
windows of the N securities do not overlap to set the covariance terms to 
zero. Inferences about the cumulative abnormal returns can be drawn using 


CRI, ту) ~ N (0,6*(r1, t2)), (4.4.21) 
since under the null hypothesis the expectation of the abnormal returns 
4 x у = x 22 
is Zero. In practice, since 62 (TN, t2) is unknown, we can usc à. (tj, t») = 
n RM 67 (rf. Ty) as a consistent estimator and proceed to test Hy using 
САБ, rz) 

22 1 
[o (tı, c0]? 


This distributional result is for large samples of cvents and is not exact 
because an estimator of the variance appears in the denominator. 

А second method of aggregation is to give equal weighting to the indi- 
vidual SCAR,’s. Defining SCAR(TI, rz) as the average over N securities from 
event time тү to Ty, we have 


h= 2 NO, 1). (4.4.22) 


N 
SCART. re) = $ Y SGAR, (ц, ту). (4.4.23) 

i=l 
Assuming that the event windows of the N securities do not overlap in 
calendar time, under Ho, SCAR(r;, tz) will be normally distributed in large 


samples with a mean of zero and variance Gia). We can test the null 
hypothesis using 


i 
2 (7) scii. vo ~ NO, 1). (4.4.24) 
Ly TEM 2 

When doing an cventstudy onc will have to choose between using J| or fy 

for the test statistic. One would like to choose the statistic with higher power, 

and this will depend on the alternative hypothesis. If the true abnormal 

return is Constant across securities then the better choice will give more 

weight to the securities with the lower abnormal return variance, which is 

what f does. On the other hand if the truc abnormal return is larger for 

jsecurities with higher variance, then the better choice will give equal weight 
to the realized cumulative abnormal return of cach security, which is what Ji 
docs. In most studies, the results are not likely to be sensitive to the choice 


of Jı versus f» because the variance of the CAR is of a similar magnitude 
across securities. 


| 4.4.4 Sensitivity to Normal Return Model 
| 


We have developed results using the market model as the normal return 
model. As previously noted, using the market model as opposed to the 
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constant-mean-return model will lead to a reduction in the abnormal re- 
turn variance, This point can be shown by comparing the abnormal return 
variances, For this illustration we take the normal return model parameters 
as given. 


The variance of the abnormal return for the market model is 
a? = VarlRy = a, — [ы] 
= Маг] - B? Var Rae] 
= (1 RË) Varl Rul. (4.4.25) 


where R? is the R? of the market-model regression for security i. 
For the constant-mean-return model, the variance of the abnormal re- 
turn £j is the variance of the unconditional return, Var[ Ry), that is, 


of = Var, = un] = Маг. (4.4.26) 
Combining (4.4.25) and (4.4.26) we have 


о = (1— Ni) og. (4.4.27) 


Since R? Kes between zero and one, the variance of the abnormal return 
using the market model will be less than or equal to the abnormal return 
variance using the constant-mcan-return model. This lower variance for 
the market model will carry over into all the aggregate abnormal return 
measures. As a result, using the market model can lead to more precise 
inferences. The gains will be greatest for a sample of securities with high 
market-model B? statistics. 

In principle further increases in А? could be achieved by using a multi- 
factor model. In practice, however, the gains in А from adding additional 
factors are usually small. 


4.4.5 CARs for the Earnings-Announcement Example 


The earnings-announcement example illustrates the use of sample abnor- 
mal returns and sample cumulative abnormal returns. Table 4.1 presents 
the abnormal returns averaged across the 30 firms as well as the averaged 
cumulative abnormal return for cach of the three carnings news categories. 
Two normal return models are considered: the market model and, for 
comparison, the constant-ncan-return model. Plots of the cumulative ab- 
normal returns are also included, with the CARs from the market model 
in Figure 4.2a and the CARs from the constantinean-return model in Fig- 
ure 4.2b. 

The results of this example are largely consistent with the existing lit- 
erature on the information content of earnings. The evidence strongly 
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The sample consists ob a total of 600 quarterly announcements for the thinty companies in the 
Dow Jones Industrial Index lor the five-year period January 1989 to December 1993. Two mod- 
els aie considered toc the tonal returns, the има ket modet using the CRSP value-weighted 
index and the constantimeanactiim inodel. The announcements are categorized into three 


groups, goad Tews, UO due 


specified day in event time aud CAR is the sample av 


s, aud bad news, d* is tle 


мире average abnormal returnu tor the 
age cumulative abnormal return [oi day 


— 20 to the specified dav. Event time is measured üt days relative te the aunouiicement date, 
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Figure 4.2a. Plot of Cumulative Market-Model Abnormal Return for Earning Announce- 
ments 
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Figure 4.2b. Plot of Cumulative Constant-Mean-Return-Model Abnormal Return for Earn- 
ing Announcements 


i 
supports the hypothesis that earnings announcements do indeed convey in- 
formation useful for the valuation of firms. Focusing on the announcement 


day (day zero) the sample average abnormal return for the good-news firm 


t. 
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using the market model is 0.965%. Since the standard error of the one-day 
good-news average abnormal return is 0.104%, the value of / is 9.28 and 
the null hypothesis that the event has no impact is strongly rejected. The 
story is the same for the bad-news firms. The event day sample abnormal 
return is —0.679%, with a standard crror of 0.098%, leading to Д equal to 
—6.93 and again strong evidence against the null hypothesis. As would be 
expected, the abnormal return of the no-news firms is small at —0.091% 
and, with a standard crror of 0.098%, is less than one standard error from 
zero. There is also some evidence of the announcement effect on day onc. 
The average abnormal returns are 0.251% and —0.204% for the good-news 
and the bad-news firms respectively. Both these values are more than two 
standard errors from zero. The source of these day-one effects is likely to be 
that some of the carnings announcements are made on event day zero after 
the close of the stock market. In these cases the effects will be captured in 
the return on day one. 

The conclusions using the abnormal returns from the constantmean- 
return model are consistent with those from the market model. However, 
there is some loss of precision using the constant-mean-return model, as the 
variance oſ the average abnormal return increases for all three categories. 
When measuring abnormal returns with the constant-mcan-return model 
the standard errors increase from 0.104% to 0.130% for good-news firms, 
from 0.098% to 0.124% for no-news firms, and from 0.098% to 0.131% 
for\bad-news firms. These increases are to be expected when considering 
a sample of large firms such as those in the Dow Index since these stocks 
tend to have an important market component whose variability is eliminated 
using the market model. 

The CAR plots show that to some extent the market gradually learns 
about the forthcoming announcement. The average CAR of the good-news 
firn}s gradually drifts up in days ~20 to I. and the average CAR of the 
badinews firms gradually drifts down over this period. In the days after the 
announcement the CAR is relatively stable, as would be expected, although 
these does tend to be a slight (but statistically insignificant) increase for the 
badjnews firms in days two through eight. 


4.4.6 Inferences with Clustering 


In analyzing aggregated abnormal returns, we have thus far assumed that 
the abnormal returns on individual securities are uncorrelated in the cross 
section. This will generally be a reasonable assumption if the event windows 
of the included securitics do not overlap in calendar time. The assumption 
allows us to calculate the variance of the aggregated sample cumulative 
abnormal returns without concern about vovariances between individual 
sample CARs, since they are zero. However, when the event windows do 
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overlap, the covariances between the abnormal returns may differ from 
zero, and the distributional results presented for the aggregated abnormal 
returns are not applicable. Bernard (1987) discusses some of the problems 
rclated to clustering. 

When there is oue event date in calendar tinc, clustering can be ac- 
commodated in two different ways. First, the abnormal returns can be 
aggregated into a portfolio dated using event time, and the security level 
analysis of Section 4.4 can be applied to the portfolio. This approach allows 
for cross correlation of the abnormal returns. 

A second way to handle clustering is to analyze the abnormal returns 
without aggregation. One can test the null hypothesis that the event has no 
impact using unaggregated security-by-security data. The basic approach is 
an application of a multivariate regression model with dummy variables for 
the event date; it is closely related to the multivariate F-test of the CAPM pre- 
sented in Chapter 5. The approach is developed in the papers of Schipper 
and Thompson (1983, 1985), Malatesta and Thompson (1985), and Collins 
aul Dent (1984). It has some advantages relative to the portfolio approach. 
First, it can accommodate an alternative hypothesis where some of the firms 
have positive abnormal returns and some of the firms have negative abnor- 
mal returns, Second, it can handle cases where there is partial clustering, 
that is, where the event date is not the same across firms but there is overlap 
in the event windows. This approach also has some drawbacks, however. In 
many cases the test statistic has poor finite-sample properties, and often it 
has little power against economically reasonable alternatives. 


4.5 Modifying the Null Hypothesis 


Thus far we have focused on a single null hypothesis—that the given event 
has no impact on the behavior of security returns. With this null hypothesis 
either à mean effect or a variance effect represents a violation, However, 
in some applications we may be interested in testing only for a mean effect. 
In these cases, we need to expand the null hypothesis to allow for changing 
(usually increasing) variances. 

To accomplish this, we need to eliminate any reliance on past returns 
in estimating the variance of the aggregated cumulative abnormal returns, 
Instead, we use the cross section of cumulative abnormal returns to form 
an estimator of the variance. Boehmer, Musumeci, and Poulsen (1991) 
discuss this methodology, which is best applied using the constantimean- 
return model to measure the abnormal retumi. 

The cross-sectional approach to estimating the variance can be applied 
to both the average cumulative abnormal return (САК (ту. ty)) and the av- 


erage standardized cumulative abnormal return (S AKC TE, tz)) . Using the 


— — 
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cross section to Form estimator of the variances we have 


{СМ ту] = з DUAR UE cod CAR.) — (5.0) 
ЎА СА. r] = Scan; (tı, Te) KRC 0)". (4.5.2) 
= 


For these estimators of the variances to be consistent we require the 


abnormal returns to be uncorrelated in the cross section, An absence of 


clustering is sufficient for this requirement. Note that cross-secdonal ho- 
moskedasticity is not required for consistency, Given these variance estima- 
tors, the null hypothesis that tlie cumulative abnormal returns are zero can 
then be tested using large sample theory given the consistent estimators of 
the variances in (4.5.2) and (4.5.1). 

One may also be interested in the impact of an event on the risk of a 


firm. The relevant measure of risk must be defined before this issue 


can 
be addressed. 


One choice as a risk measure is the market-model beta as 
implied by che Capital Asset Pricing Model. Given this choice, the market 
model can be formulated to allow the beta to change over the event window 
and the stability of the beta сап be examined. See Kane and Unal (1988) 
for an application of this idea. 


4.6 Analysis of Power 


lo interpret ап event study, we need to know what is our ability to detect 
the presence of à nonzero abnormal return. In this section we ask what is 
the likelihood that an eventstudy test rejects the null hypothesis for a given 
level of abnormal return associited with an event, that is, we evaluate the 
power ol the test. 

We considera two-sided test of the null hypothesis using the cumulative- 
abnormalreturn-based statistic Д from (4.4.22). We assume that the abnor- 
mal returns are uncorre lated across a c urities; thus the variance of CAR is 
6 (ri. Ta), where APIT, To) = 1/N7 ‚усу TA (Ty, Te) and N is the sample size. 
Under the null hypothesis the distribution of fj is standard normal. For a 
two-sided test of size a we reject the null hypothesis if Д < e7!(2/9) or if 
A> Ф а/о where C) isthe standard normal cumulative distribution 
function (CDE). 


4.6. Analysis of Power А 169 


Given an alternative hypothesis H4 and the CDF of J, for this hypothesis, 
we can tabulate the power of a test of size о using | 


Pæ, IIa) = Pr(h < Ф (4) | Ha) | 
+Pr(f > %'(1-$) | Ha). ' (461) 


With this framework in place, we need to posit specific alternative hy- 
potheses. Alternatives are constructed to be consistent with event studies 
using data sampled at a daily interval. We build eight alternative hypotheses 
using four levels of abnormal returns, 0.5%, 1.0%, 1.5%, and 2.0%, and two 
levels for the average variance of the cumulative abnormal return ofa given 
security over the sampling interval, 0.0004 and 0.0016. These variances c cor- 
respond to standard deviations of 2% and 4%, respectively. The sample size, 
that is the number of securities for which the event occurs, is varied from 
1 to 200. We document the power for a test with a size of 5% (а = 0.05) 
giving values of —1.96 and 1.96 for 7! (0/9) and H (1-9/2), respectively. 
In applications, of course, the power of the test should be considered when 
selecting the size. 

The power results are presented in Table 4.2 and are plotted in Figures 
4.3a and 4.3b. The results in the left panel of Table 4.2 and in Figure 4.3a 
are for the case where the average variance is 0.0004, corresponding to a 
standard deviation of 2%. This is an appropriate value for an event which 
does not lead to increased variance and can be examined using a one-day 
event window. Such a case is likely to give the event-study methodology its 
highest power. The results illustrate that when the abnormal return is only 
0.596 the power can be low. For example, with a sample size of 20 the power 
of a 576 test is only 0.20. One needs a sample of over 60 firms before the 
power reaches 0.50. However, for a given sample size, increases in power 
are substantial when the abnormal return is larger. For example, when the 
abnormal return is 2.0% the power of a 596 test with 20 firms is almost 1.00 
with a value of 0.99, The general results for a variance of 0.0004 is that 
when the abnormal return is larger than 196 the power is quite high even 
for small sample sizes. When the abnormal return is small a larger sample 
size is necessary to achieve high power. . 

In the right panel of Table 4.2 and in Figure 4.3b the power results 
are presented for the case where the average variance of the cumulative 
abnormal return is 0.0016, corresponding to a standard deviation of 4%. 
This case corresponds roughly to either a multi-day event window or to a 
one«lay event window with the event leading to increased variance which 
is accommodated as part of the null hypothesis. Here we see a dramatic 
decline in the power of a 5% test. When the CAR is 0.5% the power is only 
0.09 with 20 firms and only 0.42 with a sample of 200 firms. This magnitude 
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Table 4.2. Power of event-study test statistic J, to reject the null hypothesis that the abnormal 
relurn is zero. 


Sample Abnormal Retum Abnormal Return 
| Size 0.5% 1.0% 1.5% 2.0% 0.5% 1.0% 1.5% 2.0% 
| 0 =?% 0 =4% 
1 0.06 0.08 0.12 0.17 0.05 0.06 0.07 0.08 
2 0.06 0.11 0.19 0.29 0.05 0.06 0.08 0.11 
3 0.07 0.14 0.25 0.41 0.06 0.07 0.10 0.14 
4 0.08 0.17 0.32 0.52 0.06 0.08 0.12 0.17 
5 0.09 0.20 0.39 0.61 0.06 0.09 0.13 0.20 
6 0.09 0.23 0.45 0.69 0.06 0.09 0.15 9.23 
7 0.10 0.26 0.51 0.75 0.06 0.10 9.17 0.26 
8 0.11 0.29 0.56 0.81 0.06 0.11 0.19 0.29 
9 0.12 0.32 0.61 0.85 0.07 0.12 0.20 0.32 
10 0.12 0.35 0.66 0.89 0.07 0.12 0.22 0.35 
111 0.13 0.38 0.70 0.91 0.07 0.13 0.24 0,38 
| 12 0.14 0.41 0.74 0.93 0.07 0.14 0.25 0.41 
13 0.15 0.44 0.77 0.95 0.07 0.15 0.27 0.44 
14 0.15 0.46 0.80 0.96 0.08 0.15 0.9 0.46 
15 0.16 0.49 0.83 0.97 0.08 0.16 0.31 0.49 
16 0.17 0.52 0.85 0.98 0.08 0.17 0.32 0.52 
17 0.18 0.54 0.87 0.98 0.08 0.18 0.34 0.54 
18 0.19 0.56 0.89 0.99 0.08 0.19 0.36 0.56 
19 9.19 0.59 0.90 0.99 0.08 0.19 0.37 0.59 
20 0.20 0.61 0.92 0.99. 0.09 0.20 0.59 0.61 
25 0.24 0.71 0.96 1.00 0.10 0.24 0.47 0.71 
30 0.28 0.78 0.98 1.00 0.11 0.28 0.54 0.78 
35 0.32 0.84 0.90 1.00 0.11 0.32 0.00 0.54 
40 0.35 0.89 ).00 1.00 0.12 0.35 0.66 0.80 
45 0.39 0.92 1.00 1.00 0.13 0.39 0.71 0.92 
50 0.42 0.94 1.00 1.00 0.14 0.42 0.76 0.94 
60 0.49 0.97 1.00 1.00 0.16 0.49 0.83 0.97 
70 0.55 0.99 1.00 1.00 0.18 0.55 0.88 0.99 
80 0.61 0.99 1.00 1.00 0.20 0.61 0.92 0.09 
90 . 0.66 1.00 1.00 1.00 0.22 0.66 0.94 1.00 
100 0.71 1.00 1.00 1.00 0.24 0.71 0.96 1.00 
120 0.78 1.00 1.00 1.00 0.28 0.78 0.98 1.00 
140 0.84 1.00 1.00 1.00 0.32 0.84 0.99 1.00 
160 9.89 1.00 1.00 1.00 0.35 0.89 1.00 1.00 
180 0.92 1.00 1.00 1.00 0.39 0.92 1.00 1.00 
200 0.94 1.00 1.00 1.00 0 42 0.94 1.00 1.00 


The power is reported for a test with a size of 5%. The sample size is the number of event 
observations included in the study, and a is the square root of the average variance of the 
abnormal return across firms. 


of abnormal return is difficult to detect with the larger variance of 0.0016. 
In contrast, when the CAR is as large as 1.5% or 2.0% the 5% test still has 
reasonable power. For example, when the abnormal return is 1.5% and 
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Figure 4.3. Power of Event-Stady ‘Test Statistic fy to Reject the Null. Hypothesis that the 
Abnormal Return [s Zero, When the Square Root of the Average Variance of the Abnormal 
Return Across Firms is (a) 2% and (b) 4% 


there is a sample size of 30, the power is 0.54. Generally if the abnormal 
return is large one will have little difficulty rejecting the null hypothesis of 
no abnormal return. 


We have calculated power analytically using distributional assumptions. 
If these distributional assumptions are inappropriate then our power calcu- 
lations may be inaccurate. However, Brown and Warner (1985) explore this 


+. леи Analysis 


issue and ad hat the analytical computations and the empirical power are 
very close, 

Iris difficult to reach general conclusions concerning the the ability 
of eventstudy methodology to detect nonzero abnormal returns. When 
conducting an event study it is necessary to evaluate the power given the 
parameters and objectives of the study, W the power seems sufficient then 
one сап proceed, otherwise one should search for ways of increasing the 
power. This can be done by increasing the sample size, shortening the event 
window, or by developing more specific predictions of the null hypothesis. 


4.7 Nonparametric Tests 


The methods discussed to this point are parametric in nature, in that specifie 
assumptions have been made about the distribution of abnormal returns. 
Alternative nonparametric approaches are available which are free of spe- 
cilie assumptions concerning the distribution of returns, [n this section we 
discuss two common nonparametric tests for event studies, the sign test and 
the rank test. 

The sign test, which is based on the sign of the abnormal return, re- 
quires that the abnormal returns (or more generally cumulative abnormal 
returns) are independent across securities and that the expected. propor- 
tion of positive abnormal returns under the null hypothesis is 0.5. The basis 
of the testis that under the null hypothesis it is equally probable that the 
CAR will be positive or negative. II. for example, the alternative hypothe- 
sis is that there is a positive abnormal return associated with a given event, 
the null hypothesis is Ho: p < 0.5 and the alternative is Hy: p > 0.5 where 
f= PUCAR, > 0.0). To calculate the test statistic we need the number of 
eases where the abnormal return is positive, N*, and the total number of 
cases, N. Letting fy be the test statistic, then asymptotically as N increases 
we have 

Nt NI 


— ~ 0.5 


М 0.5 


> 
il 


^ NOD. 


Vor a test ol size (1 -- o), Hy is rejected il fA > o^ (a). 

A weakness of the sign test is that it may not be well specified if the 
distribution of abnormal returns is skewed, as can be the case with daily 
data. With skewed abnormal returns, the expected proportion of positive 
abnormal returns can differ from one half even under the null hypothesis, 
In response to this possible shortcoming, Corrado (1989) proposes a non- 
paranee ank test lor abnormal performance in event studies, We bi Пу 
describe his test ob the nall hypothesis that there is no abnormal return on 
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event day zero. The framework can be easily altered for events occurring 
over multiple days. 

Drawing on notation previously introduced, consider a sample of 1, 
abnormal returns for each of N securities. To implement the rank test ít 
is necessary for each security to rank the abnormal returns from 1 to H. 
Define Kir as the rank of the abnormal return of security i for event time 
period r. Recall that r ranges from T; + 1 to T; and r = 0 is the event day. 
The rank test uses the fact that the expected rank under the null hypothesis 


is nM. The test statistic for the null hypothesis of no abnormal return on 
event day zero is: 


1 In +1 A 
А = (ke 5 )ista (4.7.1) 
(12) у, 06 — | 17 
5 = — — ao 7. 
12 т=Т+1 М ixl 2 ё 


Tests of the null hypothesis can be implemented using the result that the 
asymptotic null distribution of J; is standard normal. Corrado (1989) gives 
further details. U | 
Typically, these nonparametric tests are not used in isolation but in 
conjunction with their parametric counterparts, The nonparametric tests 
enable one to check the robustness of conclusions based on parametric 
tests. Such a check can be worthwhile as illustrated by the work of Campbell 
and Wasley (1993). They find that for daily returns on NASDAQ stocks 
the nonparametric rank test provides more reliable inferences than do'the 
standard parametric tests. ! 


4.8 Cross-Sectional Models : 
| 

Theoretical models often suggest that there should be an associationtbe- 
tween the magnitude of abnormal returns and characteristics specific to 
the event observation. To investigate this association, an appropriate tool 
is а cross-sectional regression of abnormal returns on the characteristics of 
interest. To set up the model, define y as an (N x1) vector of cumulative 
abnormal return observations and X as an (N x K) matrix of characteris- 
tics. The first column of X is a vector of ones and each of the remaining 
(K — 1) columns is a vector consisting of the characteristic for each event 
ohservation, Then, for the model, we have the regression equation 


y = X@+n, (4.8.1) 


where 0 is the (Kx Û) coefficient vector and y is the (VX I) disturbance 
vector, Assuming EIX ] = 0, we can consistently estimate Ө using OLS. 
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For the OLS estimator we have 
Û = (XX)! x’. (4.8.2) 


Assuming the elements of т} are cross-sectionally uncorrelated and homo- 
skedastic, inferences can be derived using the usual OLS standard errors. 
Defining ой as the variance of the elements of ij we have 


Var[0] = (ХХ) ор. (4.8.3) 
Using the unbiased estimator for оў, 


Р 1 
^ 2 zi ata 4 
[77 NTE ТЕ) 7) 7). (4.8.4) 


where 7) = y- ХӨ, we can construct “statistics to assess the statistical signifi- 
cance of the elements of Q. Alternatively, without assuming homoskedastic- 
ity, we can construct hetcroskedasticity-consistent z-Sstalistics using 


: 1 Е 
| Var[8] = у кх! » ax)", (4.8.5) 


whdre x; is the ith row of X and 7), is the ith clement of 7). This expression 
for ihe standard errors can be derived using the Gencralized Method of Mo- 
ments framework in Section A.2 of the Appendix and also follows from the 
results of White (1980). The usc of heteroskedasticity-consistent standard 
errqrs is advised since there is no reason to expect the residuals of (4.8.1) 
to be homoskedastic. 

quith and Mullins (1986) provide an example of this approach, ‘The 
lay cumulative abnormal return for the announcement of an equity 


magnitude of the (negative) abnormal return associated with the announce- 
ди of equity offerings is related to both these variables. Larger pre-event 
cumulative abnormal returns are associated with less negative abnormal 
retutns, and larger offerings are associated with more negative abnormal 
returns. These findings are consistent with theoretical predictions which 
they discuss. 

One must be careful in interpreting the results of the cross-sectional re- 
gression approach. In many situations, the event-window abnormal return 
will be related to firm characteristics not only through the valuation effects 
of the event but also through a relation between the firm characteristics 
and the extent to which the event is anticipated. This can happen when 
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investors rationally use firm characteristics to forecast the likclihood of the 
event occurring. In these cases, a linear relation between the firm charac- 
teristics and the valuation effect of the event can be hidden. Malatesta and 
Thompson (1985) and Lanen and Thompson (1988) provide examples of 
this situation, 

‘Technically, the relation between the firm characteristics and the degree 
of anticipation of the event introduces a selection bias. ‘The assumption 
that the regression residual is uncorrelated with the regressors, IX] = 0, 
breaks down and the OLS estimators are inconsistent. Consistent estimators 
can be derived by explicitly allowing for the selection bias. Acharya (1988, 
1993) and Eckbo, Maksimovic, and Williams (1990) provide examples of 
this. Prabhiala (1995) provides a good discussion of this problem and the 
possible solutions. He argues that, despite misspecification, under weak 
conditions, the OLS approach can be used for inferences and the statistics 
can be interpreted as lower bounds on the true significance level of the 
estimates, 


4.9 Further Issucs 


A number of further issues often arise when conducting an event study, We 
discuss some of these in this section, 


4.9.1 Role of the Sampling Interval 


HF the timing of an event is known precisely, then the ability to statistically 
identify the effect of the event will be higher fora shorter sampling interval. 
The increase results from reducing the variance of the abnormal return 
without changing the mean, We evaluate the empirical importance of this 
issue by comparing the analytical formula for the power of the test statistic 
J with a daily sampling interval to the power with a weekly and a monthly 
interval. We assume that a week consists of five days and a month is 22 days. 
The variance of the abnormal return for an individual event observation is 
assumed to be (495)? on a daily basis and linear in time. 

In Figure 4.4, we plot the power of the test of no event-effect against 
the alternative of an abnormal return of 1% for 1 to 200 securities. As 
one would expect given the analysis of Section 4.6, the decrease in power 
going from a daily interval to a monthly interval is severe. For example, 
with 50 securities the power for a 5% test using daily data is 0.94, whereas 
the power using weekly and monthly data is only 0.35 and 0.12, respectively. 
The clear message is that there is a substantial payolfin terms of increased 
power from reducing the length of the event window. Morse (1984) presents 
detailed analysis of the choice of daily versus monthly data and draws the 
same conclusion. 
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Figure f. . Power af Event-Study Tet Statistic Ју to Reject the Null Hypothesis that the 
Abnormal Return is Zero, for Different Sampling Intervals, When the Square Root of the 
Average Variance of the Abnormal Return Across Firms ls 4% for the Daily Interval 


A sampling interval of one day is not the shortest interval possible. 
With the increased availability of transaction data, recent studies have used 
observation intervals of duration shorter than one day. The use of intra- 
daily data involves some complications, however, of the sort discussed in 
Chapter 3, and so the net benefit of very short intervals is unclear. Barclay 
and Litzenberger (1988) discuss the use of intra-daily data in event studies, 


4.9.2 Inferences with Fuent-Date Uncertainty 


Thus far we have assumed that the event date can be identified with certainty. 
However, in some studies it may be difficult to identify the exact date. A 
common example is when collecting event dates from financial publications 
such as the Wall Steet Journal, When the event announcement appears in 
the newspaper one can not be certain if the market was informed before 
the close of the market the prior trading day. If this is the case then the 
prior day is the event day; if noi, then the current day is the event day. The 
usual method of handling this problem is to expand the event window to 
two days—day 0 and day FI. While there is a cost to expanding the event 
window, the results in Section 16 indicate that the power properties of two- 
day event windows are sull good, suggesting that it is worth bearing the cost 
to avoid the risk of missing the event. 


4.9. Further Issues 


Ball and Torous (1988) investigate this issue. They develop a maximum- 
likelihood estimation procedure which accommodates event-date uncer- 
tainty and examine results of their explicit procedure versus the informal 
procedure of expanding the event window. The results indicate that the 
informal procedure works well and there is little to gain from the more 
elaborate estimation framework. 


4.9.3 Possible Biases 


Event studies are subject to a number of possible biases. Nonsynchronous 
trading can introduce a bias. The nontrading or nonsynchronous trading 
effect arises when prices are taken to be recorded at time intervals of one 
length when in fact they are recorded at time intervals of other possibly 
irregular lengths. For example, the daily prices of securities usually em- 
ployed in event studies are generally "closing" prices, prices at which the 
last transaction in each of those securities occurred during the trading day. 
These closing prices generally do not occur at the same time each day, but by 
calling Шеп “daily” prices, we have implicitly and incorrectly assumed that 
they are equally spaced at 24-hour intervals. As we showed in Section 3.1 
of Chapter 3, this nontrading effect induces biases in the moments and 
co-moments'of returns. 

The influence of the nontrading effect on the variances and covariances 
of individual stocks and portfolios naturally feeds into a bias for the market- 
model beta. Scholes and Williams (1977) present a consistent estimator of 
beta in the presence of nontrading based on the assumption that the true 
return process is uncorrelated through time. They also present some em- 
pirical evidence showing the nontrading-adjusted beta estimates of thinly 
traded securities to be approximately 10 to 20% larger than the unadjusted 
estimates. However, for actively traded securities, the adjustments are gen- 
erally small and unimportant. | 

Jain (1986) considers the influence of thin trading on the distribution 
of the abnormal returns from the market model with the beta estimated 
using the Scholes-Williams approach. He compares the distribution ofthese 
abnormal returns to the distribution ofthe abnormal returns using the usual 
OLS betas and finds that the differences are minimal. This suggests that in 
general the adjustment for thin trading is not important. 

The statistical analysis of Sections 4.3, 4.4, and 4.5 is based on the as- 
sumption that returns are jointly normal and temporally IID. Departures 
from this assumption can lead to biases. The normality assumption js im- 
portant for the exact finite-sample results. Without assuming normalfy, all 
results would be asymptotic. However, this is generally not a problem for 
cvent studies since the test statistics converge to their asymptotic distribu- 
tions rather quickly. Brown and Warner (1985) discuss this issue. 


4. bvent-Study Analysis 


There can also be an upward bias in cumulative abnormal returns when 
these are calculated in the usual way. The bias arises from the observation- 
by-observation rebalancing to equal weights implicit in the calculation of 
the aggregate cumulative abnormal return combined with the use of trans- 
action prices which can represent both the bid and the ask side of the 
market. Blume and Stambaugh (1983) analyze this bias and show that it 
can be important for studies using low-marketcapitalization firms which 
have, in percentage terms, wide bid-ask spreads. In these cases the bias can 
be ctjminated by considering cumulative abnormal returns that represent 
buy-and-hold strategies. 


| 
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In cldsing, we briefly discuss examples of event-study successes and limita- 
tions. Perhaps the most successful applications have been in the area of 
corparate finance. Event studies dominate the empirical research in this 
area. арония examples include the wealth effects of mergers and acqui- 
sitions and the price effects of financing decisions by firms. Studies of these 
events typically focus on the abnormal return around the date of the first 
announcement. 

In the 1960s there was a paucity of empirical evidence on the wealth 
effects of mergers and acquisitions. For example, Manne (1965) discusses 
the valious arguments for and against mergers. At that time the debate cen- 
tered gn the extent to which mergers should be regulated in order to foster 
competition in the product markets. Manne argues that mergers represent 
a natural outcome in an efficieutly operating market for corporate control 
and consequently provide protection for shareholders. He downplays the 
importance of the argument that mergers reduce competition. At tlie con- 
clusion of his article Manne suggests that the two competing hypotheses 
for mergers could be separated by studying the price effects of the involved 
corporations. He hypothesizes that if mergers created market power one 
would observe price increases for both the target and acquirer. In contrast 
if the merger represented the acquiring corporation paying for control of 
the target, one would observe a price increase for the target only and not 
for the acquirer. However, at that time Manne concludes in reference to 
the price effects of mergers at“... no data are presently available on this 
subject.” 

Since that time an enormous body of empirical evidence on mergers and 
acquisitions has developed which is dominated by the use of event studies. 
The general result is that, given a successful takeover, the abnormal returns 
of the targets are large and positive aud the abnormal returns of the acquirer 
are close to zero. Jarrell and Poulsen (1989) find that the average abnormal 
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return for target shareholders exceeds 20% for a sample of 663 successful 
takeovers from 1960 to 1985, In contrast the abnormal return for acquirers 
is close to zero at 1.14%, and even negative at — 1.10% in the 1980's. 

Eckbo (1983) explicitly addresses the role of increased market power 
in explaining mergerrelated abnormal returns. He separates mergers of 
competing firms from other mergers and finds no evidence that the wealth 
ellecis for competing firms are different. Farther, he finds no evidence that 
rivals of firms merging horizontally experience negative abnormal returns. 
From this he concludes that reduced competition in the product market 
is not an important explanation for merger gains, This leaves competition 
for corporate control a more likely explanation. Much additional empirical 
work in the area of mergers and acquisitions has been conducted. Jensen 
апа Ruback (1983) and Jarrell, Brickley, and Neuer (1988) provide detailed 
surveys of this work. 

A number of robust results have been developed from event studics 
of financing decisions by corporations. When a corporation announces 
that it will raise capital in external markets there is on average a negative 
abnormal return. The magnitude of the abnormal return depends on the 
source of external financing. Asquith and Mullins (1986) study a sample of 
266 firms announcing an equity issue in the period 1963 to 1981 and find 
that the two-day average abnormal return is 2.775, while on a sample of 
80 firms for the period 1972 to 1982 Mikkelson and Partch (1986) find that 
the two-day average abnormal return is —3.5676. In contrast, when firms 
decide to use straight debt financing, the average abnormal return is closer 
to zero. Mikkelson and Parteh (1986) find the average abnormal return 
for debt issues to be —0.23% for a sample of 171 issues. Findings such as 
these provide the fuel for the development of new theories. For example, 
these external financing results motivate the pecking order theory of capital 
structure developed by Myers and Majluf (1984). 

A major success related to those in the Corporate finance arca is the 
implicit acceptance of eventstudy methodology by the U.S. Supreme Court 
for determining materiality in insider trading cases and for determining 
appropriate disgorgement amounts in cases of fraud. This implicit ассер- 
tance in the 1988 Dasic, Incorporated у. Levinson case anc its importance 
for securities law is discussed in Mitchell and Netter (1994). 

There have also been less successful applications of event-study method- 
ology. An important characteristic of a successful event study is the ability 
to identify precisely the date of the event. In cases where the date is difficult 
to identify or the event is partially anticipated, event studies have been less 
useful. For example, the wealth effects of regulatory changes for affected en- 
tities can be difficult to detect using event-study methodology. The problem 
is that regulatory changes are often debated in the political arena over time 
and any accompanying wcalth effects will be incorporated. gradually into 
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the value of a corporation as the probability of the change being adopted 
increases. 

Dann and James (1982) discuss this issue in their study of the impact 
of deposit interest rate ceilings on thrift institutions, They look at changes 
in rate ceilings, but decide not to consider а change in 1973 because it was 
due to legislative action and hence was likely to have been anticipated by the 
market. Schipper and Thompson (1983, 1985) also encounter this problem 
in a study of merger-related regulations. They attempt to circumvent the 
problem of anticipated regulatory changes by identifying dates when the 
probability of a regulatory change increases or decreases. However, they 
find largely insignificant results, leaving open the possibility that the absence 
of distinct event dates accounts for the lack of wealth effects. 

Much has been learned from the body of research that uses event-study 
methodology. Most generally, event studies have shown that, as we would 
expect in a rational marketplace, prices do respond to new information. We 
expect that event studies will continue to be a valuable and widely used tool 
in economics and finance, 


Problems—Chapter 4 


4.1 Show that when using the market model to measure abnormal returns, 
the sample abnormal returns from equation (4.4.7) are asymptotically inde- 
pendent as the length of the estimation window (£4) increases to infinity. 


4.2 You are given the following information for an event. Abnormal re- 
turns are sampled at an interval of one day. The event-window length is 
three days. “The mean abnormal return over the event window is 0.396 per 
day. You have a sample of 50 event observations. The abnormal returns are 
independent across the event observations as well as across event days for a 
giver event observation. For 25 of the event observations the daily standard 
deviation of the abnormal return is 3% and for the remaining 25 observa- 
tions the daily standard deviation is 6%. Given this information, what would 
be the power of the test for an event study using the cumulative abnormal 
return test statistic in equation (4.4.22)? What would be the power using the 
standardized cumulative abnormal return test statistic in equation (4.4.24)? 
For the power calculations, assume the standard deviation of the abnormal 
returns is known, 


4.3 What would be the answers to question 4.2 ifthe mean abnormal return 
is 0.6% per day for the 25 firms with the larger standard deviation? 


ONE OF THE IMPORTANT PROBLEMS of modern financial economics is the 
quantification of the tradeoff between risk and expected return. Although 
common sense suggests that risky investments such as the stock market will 
generally yield higher returns than investments free of risk, it was only With 
the development of the Capital Asset Pricing Model (CAPM) that economists 
were able to quantify risk and the reward for bearing it. The CAPM implies 
that the expected return of an asset must be linearly related to the covariance 
of its return with the return of the market portfolio. In this chapter we 
discuss the econometric analysis of this model. . 

The chapter is organized as follows. In Section 5.1 we briefly review 
the CAPM. Section 5.2 presents some results from efficientset mathemat- 
ics, including those that are important for understanding the intuition of 
econometric tests of the CAPM. The methodology for estimation and testing 
is presented in Section 5.3. Some tests are based on large-sample statistical 
theory making the size of the test an issue, as we discuss in Section 5.4. Sec- 
tion 5.5 considers the power of the tests, and Section 5.6 considers testing 
with weaker distributional assumptions. Implementation issues are covered 
in Section 5.7, and Section 5.8 considers alternative approaches to testing 
based on cross-sectional regressions. 


5.1 Review of the CAPM 


Markowitz (1959) laid the groundwork for the CAPM. In this seminal re- 
search, he cast the investor's portfolio selection problem in terms of ex- 
pected return and variance of return. He argued that investors would opti- 
matly hold a mean-variance efficient portfolio, that is, a portfolio with the 
highest expected return for a given level of variance. Sharpe (1964) and 
Lintner (1965b) built on Markowitz’s work to develop economy-wide im- 
plications. They showed that if investors have homogeneous expectations 
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ahd optimally hold mean-variance efficient portfolios then, in the absence 
of market frictions, the portfolio of all invested wealth, or the market port- 
folio, will itself be a mean-variance efficient portfolio. The usual CAPM 
equation is a direct implication of the mean-variance efficiency of the mav- 
ket portfolio. 

; The Sharpe and Lintner derivations of the CAPM assume the existence 
of lending and borrowing at a riskfree rate of interest. For this version of 
a CAPM we have for the expected return of asset i, 


| ЕГА] = Ry + Bi (EI In! m 10% (5.1.1) 
Cov[ R, Rn} 
im = ` (5.1.2) 
Var[ Rn] 


witere Ny is the return on the market portfolio, and Гу is the return on ihe 
riskfree asset. The Sharpe-Lintner version can be most compactly expressed 
in terms of returns in excess of this riskfree rate or in terms of excess returns, 
Le| Zi represent the return on the ith asset in excess of the riskfree rate, 
Zi E R, — Ry. Then for the Sharpe-Lintner CAPM we have 
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where Z, is the excess return on the market portfolio of assets. Because the 
riskfree rate is treated as being nonstochastic, equations (5.1.2) and (5.1.4) 
are equivalent, In empirical implementations, proxics for the riskfree rate 
are stochastic and thus the betas can differ. Most empirical work relating to 
the Sharpe-Lintner version employs excess returns and thus uses (5.1.4). 

Empirical tests of the Sharpe-Lintner CAPM have focused on three im- 
plications of (5.1.3): (1) The intercept is zero; (2) Beta completely captures 
the cross-sectional variation of expected excess returns; and (3) The market 
risk premium, E[Z,] is positive. In much of this chapter we will focus on 
the first implication; the last two implications will be considered later, in 
Section 5.8. 

In the absence of a riskfree asset, Black (1972) derived a more general 
version of the CAPM. In this version, known as the Black version, the ex- 
pected return of asset i in excess of the zero-beta return is linearly related 
to its beta. Specifically, for the expected return of asset i, EI H, we have 


ELR] = LI Rom) + B. (ЕА, = ELS). (5.1.5) 


Ra is the return on the market portfolio, and Ji, is the return on the zero- 
beta portfolio associated with m. This portfolio is defined to be the portfolio 
that has the minimum variance of all portfolios uncorrelated with m. (Any 
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other uncorrelated portfolio would have the same expected return, but a 
higher variance.) Since it is wealth in real terms that is relevant, for the 
Black model, returns are generally stated on an inllation-adjusted basis and 
Ain is defined in terms of real returns, 


Cov[ H 


Se وی‎ (5.1.6) 
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Econometric analysis of the Black version of the CAPM treats the zero-beta 
portfolio return as an unobserved quantity, making the analysis more com- 
plicated than that of the Sharpe-Lintner version. Thie Black version can be 
tested as a restriction on the real-return market model. For the real-rcturn 
market model we have 


E[ Ri] = Wy, +} Bink Кы], (5.1.7) 


and the implication of the Black version is 


am = EL Rom) (1 — Ban) Yi (5.1.8) 


In words, the Black model restricts the asset-specific intercept of the real- 
return market model to be equal to the expected zero-beta portfolio return 
times one minus the asset's beta. 

The CAPM is a single-period model; hence (5.1.3) and (5.1.5) do not 
have atime dimension. For econometric analysis of the model, itis necessary 
to add an assumption concerning the time-series behavior of returns and es- 
timate the model over time. We assume that returns are independently and 
identically distributed (IID) through time and jointly multivariate normal. 
This assumption applies to excess returns for the Sharpe-Lintner version 
and to real returns for the Black version. While the assumption is strong, 
it has the benefit of being theoretically consistent with the CAPM holding 
period by period; it is also a good empirical approximation for a monthly 
observation interval. We will discuss relaxing this assumption in Section 5.6. 

The CAPM can be useful for applications requiring a measure of ex- 
pected stock returns, Some applications include cost of capital estimation, 
portfolio performance evaluation, and event-study analysis. As an example, 
we briefly discuss its use for estimating the cost of capital. The cost of equity 
capital is required for use in corporate capital budgeting decisions and in 
the determination of a fair rate of return for regulated utilities. Implemen— 
tation of the model requires three inputs: the stock's beta, the market risk 
premium, and the riskfree return. The usual estimator of beta of the equity 
is the OLS estimator of the slope coefficient in the excess-return market 
model, that is, the beta in the regression equation 


Zi = Amt Pim Zm + єн, (5.1.0) 
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where i denotes the asset and ¢ denotes the time period, = 1,.... T. Za 
and Zm are the realized excess returns in time period ¢ for asset i and the 
market portfolio, respectively. Typically the Standard and Poor's 500 Index 
serves as a proxy for the market portfolio, and the US Treasury bill rate 
proxies for the riskfree return. The equation is most commonly estimated 
using 5 years of monthly data (7 = 60). Given an estimate of the beta, the 
cost of capital is caleukited using a historical average for the excess return 
an the S&P 500 over ‘Treasury bills. This sort of application is only justified 
il the CAPM provides a good description of the data. 


5.2 Results from Efficient-Set Mathematics 


In this section we review the mathematies of mean-variance efficient sels. 
The interested. reader is referred to Merton (1972) and Roll (1977) for 
detailed treatments; An understanding of this topic is useful for interpret- 
ing much of the empirical research relating to the CAPM, because the key 
testable implication of the CAPM is that the market portfolio of risky assets 
is à mean variance efficient portfolio. Efficient-set mathematics also plavs a 
vole in the analysis of multifactor pricing models in Chapter 6. 

We start with some notation. Let there be N risky assets with mean 
vector ft and covariance matrix N. Assume that the expected returns of at 
least two assets differ and that the covariance matrix is of full rank. Define 
wa, as the (N x3) vector of portfolio weights for an arbitrary portfolio a with 
weights summing to unity. Portfolio a has mean return jo, = wape and 
variance % = W, Nwa. The covariance between any two portfolios a and 
bis G Puy. Given the population of assets we next consider minimum- 
variance portfolios in the absence of a riskfree asset. 


Definition. Portfolio p is the minimum-variance portfolio of all portfolios with mean 


return pt if its portfolio weight vector is the solution to the following constrained 
optimization: 


in (5.2.1) 
w 
subject to 
Qu = [t (5.2.2) 
wt = 1. (5.2.3) 


"To solve this problem, we form the Lagrangian function L, differentiate with 
respect to w, set the resulting equations to zero, and then solve for w. For 
the Lagrangian function we have 


L = c Qo 4 Mg — wfe) + SL wt). (5.9.4) 
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where zis a conforming vector of ones and 8, and 8; are Lagrange multipli- 
ers. Differentiating L with respect to w and setting the result equal to zero, 
we have 


200 — êu ار‎ = 0. (5.2.5) 
Combining (5.2.5) with (5.2.2) and (5.2.3) we find the solution 
wp = ghi )5.2.6( 


where g and h are (N x 1) vectors, 
1 
g = ТВОТО -AN 1] (5.2.7) 


1 | 
ch = 5 en - A(7!4)], (5.2.8) 
| 
and A = COT, B= ШО, CS, and D = BC A2. i 
Next we summarize a.number of results from efficient-set mathematics 
for minimum-variance portfolios. These results follow from the form of the 
solution for the minimum-variance portfolio weights in (5.2.6). 


Result 1: The minimum-variance frontier can be generated from any two 
distinct minimum-variance portfolios. 


Result 1: Any portfolio of minimum variance portfolios is also a minimum- 
variance portfolio. 


Result 2: Let p and r be any two minimum-variance portfolios. The covari- 
ance of the return of p with the return of r is | 


4 
C А" А 1 
Соу[ Ry, NR.] = D (e C ©) ( 5) + T (5.2.9) 


Result 3: Define portfolio g as the global minimum-variance portfolio. For 
portfolio g, we have 


i 
we, = 0 t (5.2.10) 
A 
M. 7 с (5.2.11) 
a = е (5.2.12) 
к C 


Result 4: For each minimun-variance portfolio p, except che global mini- 
mum-variance portfolio g, there exists a unique minimum variance port- 


folio that has zero covariance with f. This portfolio is called the zero- 
beta portfolio with respect to p. 
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Result 4’: The covariance of the return of the global minimumwariance 
portfolio g with any asset or portfolio of assets a is 


1 
Соу[ fiy, Ral = c (5.2.13) 


absence of a riskfree asset in mean-standard deviation space. Minimum- 
variance portfolios with an expected return greater than or equal to the 
ekpected return of the global minimum-variance portfolio arc efficient port- 
folios. These portfolios have the highest expected return of all portfolios 
with an equal or lower variance of return. In Figure 5.1 the minimum- 
variance portfolio is g. Portfolio p is an efficient portfolio. Portfolio of is 
the zero-beta portfolio with respect to p. It can be shown that it plots in 
the location shown in Figure 5.1, that is, the expected return on the zcro- 
beta portfolio is the expected return on portfolio f, less the slope of the 
minimumx-variance frontier at p times the standard deviation of portfolio f. 


b Figure 5.1 illustrates the set of minimum-variance portfolios in the 


Result 5: Consider a multiple regression of the return on any asset or port- 
folio Ra on the return of any minimum variance portfolio ft, (except for 
the global minimum-variance portfolio) and the return of its associated 
zero-beta portfolio N- 


Ra = Bo + BiRop + Bolt + e (5.2.14) 
Ele, | Rp. Rp} = 0. (5.2.15) 
For the regression coefficients we have 
Cov{ Ra, Ry] ЕЕ 
p = ل‎ = В (5.2.16) 
1 0% 
Covl l, N] 
51 = аи = 1- By (5.2.17) 
0% 
fo = 0 (5.2.18) 


1 


where Вар is the beta of asset а with respect to portfolio f. 
Result 5': For the expected return of a we have 


à Ша = (1 т Bap) ap + Bapt p- (5.2. 19) 

We next introduce a riskfree asset into the analysis and consider portſo- 

lios komposed of a combination of the N risky assets and the risktree asset. 

With a riskfree asset the portfolio weights of the risky assets arc not con- 

strained to sum to 1, since (1 — wt) can be invested in the riskfree asset, 
; 


i 
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Figure 5. I. Minimum-Variance Portfolios Without Ir Asset 


Given a riskfree asset with return Jy the minimum-variance portfolio with 
expected return др will be the solution to the constrained optimization 


minw Ош (5.2.20) 
ш 


subject to 
wr = wy = py. (5.2.21) 
As in the prior problem, we form the Lagrangian function L, differentiate 
it with respect to w, set the resulting equations to zero, and then solve for 
w. For the Lagrangian function we have ¢ 
L = w'fho +5 (прои (1 w't) Ry). (5.2.22) 
Differentiating I. with respect to w and setting the result equal to zero, we 
have 
2w — (и — Гри) = 0. ` (5.2.23) 
Combining (5.2.23) with (5.2.21) we have 
(Hp =, Hy) 


lp FR. (5.2.24) 
(и — Haya — Ми) d 4 
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Note that we can express wp as a scalar which depends on the mean of p 
times a portfolio weight vector which does not depend on p, 


wp = „. (5.2.25) 
where f 
ty = Be =, жЕ (5.2.26) 
qu Ry Qe Кро) 
and 


B= Әри = Ки). (5.2.27) 


Thus with a risklree asset all ninimumsvariance portfolios are a combination 
ofa given risky asset portfolio with weights proportional to @ and the riskiree 
asset. This portfolio of risky assets is called the tangency portfolio and has 
weight vector 


1 
OS ы ш rs s qu: (5.2.98) 
|o KQ l= Ra) SN 


We use the subscript q to identify the tangency portfolio, Equation (5.2.28) 
divides the elements of à by their sum to get a vector whose elements sum 
to one, that is, a portfolio weight vector. Figure 5.2 illustrates the set of . 
minimnnaariance portfolios in the presence of a riskfree asset. With a 
viskfree asset all efficient portfolios lie along the line from the riskfree asset 
through portfolio ¢. 

The expected excess return per unit risk is useful to provide a basis for 
economic interpretation of tests of the CAPM. The Sharpe ratio measures 
this quantity. For any asset or portfolio a, the Sharpe ratio is defined as the 
mean excess return divided by the standard deviation of return, 
fla — Ry 


Wa oS 


(5.2.29) 
G, 

In Figure 5.2 the Sharpe ratio is the slope of the line from the riskfree return 
(Ау. 0) 10 the portiolio %, 0,). The tangency portfolio q can be character- 
ized as the portfolio with the maximum Sharpe ratio of all portfolios of risky 
assets, Testing the mean-variance efficiency ofa given portfolio is equivalent 
to testing whether the Sharpe ratio of that portfolio is the maximum of the 
set of Sharpe ratios of all possible portfolios. 


5.3 Statistical Framework for Estimation and Testing 
напау we use the assumption that investors can borrow and lend ata 


riskIree rate of return, and we consider the Sharpe-Lintner version of the 
CAPM. Then, we eliminate this assumption and analyze the Black version. 
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Figure 5.2. Minimum-Variance Portfolios With Riskfree Asset 


5.3.1 Sharpe-Lintner Version 


Define Z, as an (Nx 1) vector of excess returns for N assets (or portfolios 
of assets). For these N assets, the excess returns can be described using the 
excess-return market model: 


УА = at 82. + €, (5.3.1) 

Efe] = 0 (5.3.2) 

E[ee/] = E (5.3.3) 

El Zul un. El(Zu Ha) ] = on (5.3.4) 
Covl Au. 6] = 0. 65.3.5) 


B is the (Nx1) vector of betas, Zm is the бте period t market portfolio 
excess return, and a and e, are (N x 1) vectors of asset return intercepts and 
disturbances, respectively. As will be the case throughout this chapter we 
have suppressed the dependence of a, Û, and e, on the market portfolio or 
its proxy. For convenience, with the Sharpe-Lintner version, we redefine p 
to refer to the expected excess return. i 
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The implication of the Sharpe-Lintner version of the CAPM for (5.3.1) 
ip that all of the elements of the vector @ are zero, This implication follows 
from comparing the unconditional expectation of (5.3.1) to (5.1.3) and 
forms thc principal hypothesis for tests of the model. If all elements of a 
are zero then m is the tangency portfolio, 

We use the maximum likelihood approach to develop estimators of 
the unconstrained model. Ordinary least squares (OLS) regressions asset 
by asset lead to the same estimators for о and B. To start, we consider 
the probability density function (рар of excess returns conditional on the 
excess return of the market. Given the assumed joint normality of excess 
returns for the pdf of Zi, we have 


fO | Zu) = (Qn Hx 
х exp IEC. = (Z. -g 2%) ] (5.3.6) 


and since excess returns are temporally IID, given T observations, the joint 
probability density function is 


JSZ, 7», — Zr | Ал, Zmz tty Zur) 
T 
П, 1 2) - (5.3.7) 


t=1 


ll 


Ш 


T AY 1 
[Ies 
t=1 


х exp[-3(Z, a -, a — G)]. (5.3.8) 


thd excess-return market model can be estimated using maximum likelihood. 
This approach is desirable because, given certain regularity conditions, 
maximum likelihood estimators are consistent, asymptotically efficient, and 
asymptotically normal, To define the maximum likelihood estimator, we 
fo rm the log-likelihood function, that is, the logarithm of the joint probability 
density function viewed as a function of the unknown parzuneters, cx, B, and 
X. Denoting £ as the log-likelihood function we have: 


exces (5.3.8) and the excess-return observations, the parameters of 


= 


i NT T 

L(a, . T) = c» log(2z) — 7 log [5] 
| 1 

x3 d. -- G EZ-a- BZm). (5.3.9) 


ist 
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The maximum likelihood estimators are the values of the parameters which 
maximize С. To find these estimators, we differentiate £ with respect to a, 
B, and E, and set the resulting equations to zero. The partial derivatives 
are 


ac EE | " 
mu. m SE d. =a – В/м) (5.3.10) 
ac "EE "m 
— = E| YOZ -a – ВА) (5.3.11) 
98 - | 
aL p ly 


1 T р 
+557! Z. - a В) e Ва) XU. (5.5.12) 


tal 
Setting (5.3.10), (5.3.11), and (5.3.12) to zero, we can solve for the maxi- 
mum likelihood estimators. These are 


à = = Bin (5.3.13) 
Tes NY К 

2 7, = 2 “т 

@ = dere (% = BY = fn) (5.3.14) 


SL n a? fu? | 
T 
1 2 E — 
> = У, = à - В) = À BA. (5.3.15) 
i=l 
where 


=| 


: eat | ЖЕ; | 
H = 903/2 and fin = J Eber 


As already noted, these are just the formulas for OLS estimators of the 
parameters. 

The distributions of the maximum likelihood estimators conditional on 
the excess return of the market, Zar. Inge. .. Zur follow from the assumed 
joint normality of excess returns and the HD assumption. The variances and 
covariances of the estimators can be derived using the inverse of the Fisher 
information matrix, As discussed in the Appendix, the Fisher information 
matrix is minus the expectation of the second order derivative of the log- 
likelihood function with respect to the vector of the parameters. 
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: 1 й? ет 
«= М (o. D [ * 55 z) (5.3.16) 
On 
if 
ps (в. = Ix =) (5.3.17( 
6% 


TE ~ .- 2. O). (5.3.18) 


where fi, is as previously defined and 


at? 1 £ a о 
On F т У x Hw) . 
t=} 

The notation Wy(7 — 2, X) indicates that the (Nx N) matrix TX has a 
Wishart distribution with (7 2) degrees of freedom and covariance ma- 
trix L. This distribution is a multivariate generalization of the chi-square 
distribütion, Anderson (1984) and Muirhead (1983) provide discussions of 
its properties. 


The covariance of & and В is 
7 1 TM 
C Pl x. (5.3.19) 
lios 


У is independent of both & and В. 


Using the unconstrained estimators, we can form a Wald test statistic of 
the null hypothesis, 


Hi aw = 0 (5.3.20) 
against the alternative hypothesis, 
li: a * 0. (5.3.21) 


The Wald test statistic is 


h= & мата] а 
al! 
= F [ + ке | & X^, (5.3.29) 


where we have substituted from (5.3.16) for Var[&]. Under the null hypoth- 
esis Jy will have a chi-square distribution with N degrees of freedom. Since 
Lis unknown, to usc Ay for testing Ho, we substitute a consistent estimator 
lor 3i in (5.3.22) and then asymptotically the null distribution will he chi- 
square with N degrees of freedom. The maximum likelihood estimator of 
У can serve as а consistent estimator. 
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However, in this case we need not resort to large-sample distribution the- 
ory to draw inferences using a Wald-type test. The finite-sample distribution, 
which is developed in MacKinlay (1987) and Cibbons, Ross, and Shanken 
(1989), can be determined by applying the following theorem presented in 
Muirhead (1983): | 

| 
Theorem. lel the m-vector x be distributed N (0,9), let the (mx m) matrix A be 
distributed W,,(n, Q) with (n > m), and let x and A be independent. Then: 


n—m+) „ А | 
Caa XA "x 2 Famil | 
т i 


To apply this theorem we set x = VTU + 52/602] à; A = TX, 
m = MN, inden = (T — 2). Then defining Jı as the test statistic we have! 


PN atq мз. 
F $ &. 3. 
hn Safe] afe өз 


Under the null hypothesis, J; is unconditionally distributed central F with 
N degrees of freedom in the numerator and (T — N — 1) degrees of freedom 
in the denominator. 

We can construct the Wald test f; and the finite-sample F-test J, using 
only the estimators from the unconstrained model, that is, the excess-re n 
market model. To consider a third test, the likelihood ratio test, we need 
the estimators of the constrained model. For the constrained model, the 
Sharpe-Lintner CAPM, the estimators follow from solving for Û and E from 
(5.3.11) and (5.3.12) with о constrained to be zero. The constrained esti- 
mators are 


* КИ 7. Zn 
8 = PN 210 (5.3.24) 
aver Zt 
КА [x x - 
E = zn» -H Z.. BZ. . (5.3.25) 
t=] 


The distributions of the constrained estimators under the null hypothesis 
are 


T 1 1 
B ~ N (e. T Pr z) (5.3.26) 
TS’ ~ WwT-1.X) (5.3.27) 


Given both the unconstrained and constrained maximum likelihood esti- 
mators, we can test the restrictions implied by the Sharpe-Lintner version 
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using the likelihood ratio test. This test is based on the logarithm of the like- 
lihood ratio, which is the value of the constrained log-likelihood function 
minus the unconstrained log-likclihood function evaluated at the maximum 
likelihood estimators. Denoting CR as the log-likelihood ratio, we have 


LR C — 


i 


T * ^ 
– 5 [log lB |- logIX:l]. (5.3.28) 


where C' represents the constrained log-likelihood function, To derive 
(5.3.28) we have used the fact that summation in the last term in hoth 
the unconstrained and constrained likelihood function evaluated at the 
maximum likelihood estimators simplifies (о NT. We now show this lor 
the unconstrained function. For the summation of the last term in (5.3.9), 
evaluated at the maximum likelihood estimators, we have 


L. ~ â G (Z — å B (5.3.29) 


li 


T 
Duaceſ B % & - * B] — (5.3.30) 
t=] 


T 
trace | O JOZ = & G -&à-BZY| (5.3.3) 


t=] 


uacel S (TÈ) = Tirace(1] = NT. (5.3.32) 


ti 


| 
i 
The step from (5.3.29) to (5.3.30) uses the result that trace AB = trace BA, 
and the step to (5.3.31) uses the result that the trace of a sum is equal to the 
sum of a trace. In (5.3.32) we use the result that the trace of the identity 
matrix is equal to its dimension. 

Thetestis based on the asymptotic result that, under the null hypothesis, 

times the logarithm of the likelihood ratio is distributed chi-square with 


ddgrecs of freedom equal to the number of restrictions under Hy. That is, 
wd can test Ho using 


Ь -2LR 


il 


T flog 1S" | = log 11] A yt. (5.3.33) 


ae 


Ш 


Interestingly, here we need not resort to large-sample theory to con- 
duct a likclihood ratio test. Ji in (5.3.23) is itself a likelihood ratio test 
statistic. This result, which we next develop, ГоПом from the fact that fi is 
a monotonic transformation of fj. The constrained maximum likelihood 
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estimators can be expressed in terms of the unconstrained estimators. For 
ne 
B we have 
at ^ m Т 2 n 
B = Dt. (5.3.34) 
H m + 7 
and for x we have 
ED LL as E 
D = x24 HL В 7ш} 
i=l" 


Е 
1 


^ Oye 1 Zim ^ 
— m [® = ВА) + (: = z) a 
1 i=l * n2 + 6% 


т 


КЩ inn И 
x |a. — — DB Zu) + (1 = xu) 4| 5 (5.3.35) 
fi FoF 


m 


Noting that 


Т А "PA 
Z. - à - B Zm) ( - 74 157 à = 0, (5.3.36) 
t=! Bin * um 
we have 
А 2 6% ang 
E = У + (S) aa, (5.3.37) 
Hm m 
Taking the determinant of both sides we have 
m - 92 اچ‎ 
[x | = [DI (= 27) а * 5.3.38) 
Jun й On 


where to go from (5.3.37) to (5.3.38) we factorize È and use the result that 
JI + xx'| = (1 + x'x) for the identity matrix I and a vector x. Substituting 
(5.3.38) into (5.3.28) gives 


LR : 1 А. ) * à 1 (5 9) 
R= — log ex at Я 5.3.34 
9 Б A? „4? 


and for д we have 


ds — — (exp [4] -1) (5.3.40) 


which is a monotonic transformation of f». This shows that Ji can be inter- 
preted as a likelihood ratio test. 

Since the finite-sample distribution of Д is known, equation (5.3.40) 
can also be used to derive the finite-sample distribution of J. As we shall 
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see, under the null hypothesis the finitesample distribution of f» can differ 

from its large-sample distribution. Jobson and Korkie (1982) suggest an 

adjustment %% which has better finite-sample properties. Defining fi as 
the modified statistic, we have | 
"gg / 

47 2) 
T 


= (1-2-2) ор log 5 x. (330) 


b = 


We will visit the issue of the finite-sample properties of J and fy in Sec 
tion 5,4, 

A useful economic interpretation can be made of the test statistic А 
using results from efficientset mathematics. Gibbons, Ross, and Shanken 
(1989) show that 


ai ee 
T-N-—1 5, 52 

A= : N : - — (5.3.42) 
1+ £s 


where the portfolio denoted by q represents the ex post tangency portfolio 
constructed as in (5.2.28) from the & included assets plus the market port- 
folio. Recall from Section 5.2 that the portfolio with the maximum squared 
Sharpe ratio of all portfolios is the tangency portfolio, Thus when ex post 
the market portfolio is the tangency portfolio J; will be equal to zero, and 
as the squared Sharpe ratio of the market decreases, fy will increase, indi- 
cating stronger evidence against the efficiency of the market portfolio. In 
Section 5.7.2 we present an empirical example using Д after considering 
the Black version of the CAPM in the next section. 


5.5.2 Black Version 


In the absence of a riskfree asset we consider the Black version of the CAPM 
in (5.1.5). The expected return on the zero-beta portfolio E[ A5, is treated 
as ап ynobservable and hence becomes an unknown model parameter, 
Defining the zevo-beta portfolio expected return as y, the Black version 
is 


FIR} = ey + PEU] — y) 
= (-Pyy + BEL Rad. (5.3.43) 


With the Black model, the unconstrained model is the reul- return market 
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model, Define В, as an (N x1) vector of real returns for N assets (or port- 
folios of assets). For these N assets, the real-return market model is 


R. = a+ BR, € ; (5.3.44) 
Ele] = 0 (5.3.45) 

Efe] = E (5.3.46) 

El Rac] = Re Elta д) = o} (5.3.47) 
Cov[R,, El] = 0. (5.3.48) 


В is the (N x D) vector of asset betas, Rm is the time period t market port- 
folio return, and а and e, are (N x1) vectors of asset return intetcepts and 
disturbances, respectively. | 
The testable implication of the Black version is apparent from compar- 
ing the unconditional expectation of (5.3.44) with (5.3.43). Theimplication 
15 | 


= (t — В)у. (5.3.49) 


This implication is more complicated to test than the zero-intercept restric- 
tion of the Sharpe-Lintner version because the parameters Û and y enter 
in a nonlinear fashion. | 

Given the IID assumption and the joint normality of returns, the Black 
version of the CAPM can be estimated and tested using the maximum like- 
lihood approach. The maximum likelihood estimators of the unrestricted 
model, that is, the real-return market model in (5.3.44), are identical to the 
estimators of the excess-return market model except that real returns are 
substituted for excess returns, Thus f, for example, is now the vector of 
sample mean real returns. For the maximum likelihood estimators of the 
parameters we have 


а = n- BI (5.3.50) 


3 Emu AY Rw > Ân) (5.3.51) 
Y aO Ё) B 


12 A 88 
Ê = 5 YOR, - A BHR - A . (5.3.59) 
t=1 


where 
id 


Н , Dx 
p= TÈR and Am = т? 


t=] 
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Conditional on the real return of the market, Ruy. Ryo, i... Rar, the distri- 
butions are 


(5.3.53) 


Q 
i 
z 
TURNS. 
8 
ای‎ 
rap 
T 
mA 
11 
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А PT 
û ~ N (p. T E z) (5.3.54) 
on 
TÈ ~ WWT-2.XW. | (5.3.55) 
where 
Lx 
62 = nO Aa. 
9 1 t=] 
п. covariance of à and B is 
| » А 
\ Соу[&, B] = -[&] D. (5.3.56) 
t On 


| For the constrained model, that is, the Black version of the CAPM, the 
log-likelihood function is 
i 


NT T 
L(y. В, E) = Pree log (2) — y log 15] 
; E 
-3 G. = y(t = P) ,) 
tel 


x (R. — y(t — B) — Ba). (5.3.57) 


Differentiating with respect to y, B, and X, we have 


' ac т 
gy Т gz lx (R. — y(t — 8) — ZI (5.3.58) 
[ESSI 
ac ale | 
ap ^ x Ў (R. ~ y(t 8) = ВК) GU — y) (5.3.59) 
[zl 
al Pict ducat 
y ferme hs > EN s 
9 2 2 * 2 E y(t = Bin) 


х (R, — ye ) - Т 9. (5.3.60) 
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Setting (5.3.59), (5.3.59), and (5.3.60) to zero, we can solve for the maxi- 
mum likelihood estimators. These are: 


ae 29d a+ 
x -BYE 1 3 flu 
p = eof еи Po) (5.3.61) 
((— 0 yx (2 -- В ) 
T ^ ` 
ae R,—-y' Rm — * 
j Leder» mE (5.3.62) 
Dorey ne — *˙ 
У = TA —5 ( В )- 8 1 (R. = 7 (L~ 3 )- В Кы) . (5.3.63) 
t=] 


Equations (5.3.61), (5.3.62), and (5.3.63) do not allow us to solve explicitly 
for the maximum likelihood estimators. The maximum likelihood estima- 
tors can be obtained, given initial consistent estimators of and X, by 
iterating over (5.3.61), (5.3.62), and (5.3.63) until convergence. The un- 
constrained estimators B and È can serve as the initial consistent estimators 
of B and X, respectively. 

Given both the constrained and unconstrained maximum likelihood 
estimators, we can construct an asymptotic likelihood ratio test of the null 
hypothesis. The null and alternative hypotheses are 


Hy: @ = (=) (5.3.64) 
Hy: a £z (6 -) y. (5.3.65) 


A likelihood ratio test can be constructed in a manner analogous to the test 
constructed for the Sharpe-Lintner version in (5.3.33). Defining Jy as the 
lest statistic, we have 


h= T [log ^1 Yogi] DE Е (5.3.66) 


Notice that the degrees of freedom of the null distribution is N — 1. Relative 
to the Sharpe-Lintner version of the model, the Black version loses one de- 
gree of freedom because the zero-beta expected return is a free parameter. 
In addition to the N(N — 1)/2 parameters in the residual covariance matrix, 
the unconstrained model has 2 N parameters, N parameters comprising the 
vector а and N comprising the vector B. The constrained model has, in 
addition to the same number of covariance matrix parameters, N parame- 
ters comprising the vector Û and the parameter for the expected zero-beta 
portfolio return y. Thus the unconstrained model has (V= 1) more free 
parameters than the constrained model. 


In the context of the Black version of the CAPM, Gibbous (P082) first developed this test, 
Shanken (1985D) provides detailed analysis. 
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We can also adjust /1 10 improve the finite-sample properties, Defining 
А às the adjusted test statistic we have 


*. [tog 131 - tog 151] i9 (5.3.67) 


In finite samples, the null distribution of f, will more closely match the chi- 
square distribution. (See Section 5.4 for à comparison in the context of the 
Sharpe-Lintner version.) 

There are two drawbacks to the methods we have just discussed. First, 
the estimation is somewhat tedious since one must iterate over the first-order 
conditions, Second, the test is based on large sample theory and can have 
very poor finite-sample properties. We can use the results of Kandel (1984) 
and Shanken (1986) to overcome these drawbacks. These authors show how 
to calculate exact maximum likelihood estimators and how to implement 
an approximate test with good finite-sample performance. 

For the unconstrained model, consider the market model expressed in 
terms of returns in excess of the expected zero-beta return y: 


R., ye = Ot BUR = y) + €i (5.8.68) 


Assume у is known. Then the maximum likelihood estimators for the un- 
constrained model are 


dy) = ту = у). (5.3.59) 
T ^ А 
û = Le AY Re йм) (5.3.70) 


ile * Pm)? 


and 
А 12 ^ ^ 
У == T „lx, =? m = BBm — fen) | {R, 7 m D BG m Ra. (5.3.71) 
fol 


The unconstrained estimators of B and X do not depend on the value of y 
but, as indicated, the estimator of a does. The value of the unconstrained 
log-likelihood function evaluated at the maximum likelihood estimators is 


р NT T „ NT ES 
LOG —— logt) -= > Ing opes (5.3.72) 


which does not depend on y. 
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Constraining о to be zero, the constrained estimators are 


5. ER rou- ¥) 


Drei Ut у)? | 
is Le Re. "ad | | 
£ = „ N v -A- | 
t=1 | 
+ (к, yt a Â’ Rai) : (5.8.74) 
and the value of the constrained likelihood function is 
NT T as NT 
L(y) = TU log(271) — 2 Іор [5 (y)] - EX (5.3.75) 


Note that the constrained function does depend on y. Forming the loga- 
rithm of the likelihood ratio we have 


LRiy) = C= 


It 


Tr a , 
-3 [eel "E log 121]. (5.3.76) 


The value of y that minimizes the value of the logarithm of the likeli- 
hood ratio will be the value which maximizes the constrained log-likelihood 
function and thus is the maximum likelihood estimator of y. 

Using the same development as for the Sharpe-Lintner version, the 
log-likelihood ratio can be simplified to 


ce са W 6700 „S â js 
М 2 E| an yý taz) E у 


ТА ( on ) Blin ) 
оов Ц rca) Ye Bin 


x [h- ye - Bü, — y) + |. (5.3.77) 


Minimizing CR with respect to y is equivalent to maximizing С where 


2 А 
G= (a) а-у. (й Y € ià- ye- e y). 
(A = У) ＋ 0, 

(5.3.78) 
Thus the value of y which maximizes G will be the maximum likelihood 
estimator. There are two solutions of 3G/3y = 0, and these are the real 

roots of the quadratic equation | 
HO) = Ау*+ By +С, (5.3179) 

| 

. | 


202 | 5. The Capital Asset Pricing Model 


where 
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ICA is greater than zero, the maximum likelihood estimator y* is the largest 
rot, and if A is less than zero, then y* is the smallest rool. A will be 
greater than zero if Am is greater than the mean return on the sample 
global minimum-variance portfolio; that is, the market portfolio is on the 
фае part of the constrained mean-variance frontier. We can substitute 
5 into (5.3.62) and (5.3.63) to obtain B and Ў" without resorting to an 
iterative procedurc. 

| We can construct an approximate test of the Black version using returns 
in excess of y as in (5.3.68). If y is known then the same methodology used 
to konstruct the Sharpe-Lintner version F-test in (5.3.23) applies to testing 
the null hypothesis that the zero-beta excess-return marketmodcl intercept 
is zero. The test statistic is 


T-N-1 1 — 2 
( haf ү) 


АО) = N 52 


т 


-1 
| ау È” & ) ~ Nai-Nai. 
(5.3.80) 
Because y is unknown, the test in (5.3.80) cannot be directly implemented. 
But an approximate test can be implemented with (И). Because y = y* 
minimizes the log-likelihood ratio, it minimizes fo(y). Hence fi(y*) < 
(у), where y, is the unknown true value of y, Therefore a test using 
Љо) will accept too often. If the null hypothesis is rejected using p“ it will 
be rejected for any value of y,. This testing approach can provide a useful 
check because the usual asymptotic likelihood ratio test in (5.3.77) has been 
found to reject too often. 

Finally, we consider inferences for the expected zero-beta portfolio re- 
turn. Given the maximum likelihood estimator of y, we require its asymp- 
totic variance to make inferences, Using the Fisher information matrix, the 
asymptotic variance of the maximum likelihood of y is 


m 


ts 2 
Var[y*] = 4 СЕРИЮ (5.3.81) 
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This estimator can be evaluated at the maximum likelihood estimates, and 
then inferences concerning the value of y are possible given the asymptotic 
normality огу“. 


5.4 Size of Tests 


In some econometric models there are no analytical results on thc finite- 
sample properties of estimators. In such cases, it is сопипоп to rely on large- 
sample statistics to draw inferences. This reliance opens up the possibility 
that the size of the test will be incorrect if the sample size is not large enough 
for the asymptotic results to provide a good approximation. Becausc there 
is no standard sample size for which large-sample theory can be applied, it 
is good practice to investigate the appropriateness of the theory. 

The multivariate F-test we have developed provides an ideal framework 
for illusuating the problems that can arise one relies on asymptotic distri- 
Laton theory for inference. Using the known finite-sample distribution of 
the F-test statistic Jj, we can calculate the finite-sample size for the various 
asymptotic tests. Such calculations are possible because the asymptotic test 
statistics are monotonic transformations of Д. 

We draw on the relations of J to the large-ssample test statistics. Com- 
paring equations (5.3.22) and (5.3.23) for fy we have 


= N 
h= VoU e (5.4.1) 
Recall in (5.3.40) for fy we have 

T-N-1 
Л = — (exp [4] E 1) . (5.4.2) 

and for Js from (5.4.2) and (5.3.41), 

(J- V=) Д 

A= асы (o [gt | = 1). (5.4.3) 


Under the null hypothesis, Jo, J2, and Д are all asymptotically distributed 
chi-square with N degrees of freedom, The exact null distribution of Д 
is central F with N degrees of freedom in the numerator and P — N — 1 
degrees of freedom in the denominator. 

We calculate the exact size ofa test based on a given large-sample statistic 
and its asymptotic 5% critical value. For example, consider a test using fo 
with 10 portfolios and 60 months of data. In this case, under the null 
hypothesis fj is asymptotically distributed as a chi-square random variate 
with 10 degrees of freedom. Given this distribution, the critical value for a 
test with an asymptotic size of 5% is 18.31. From (5.4.1) this value of 18.31 
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for / corresponds to a critical value of 1.495 for J. Given that the exact 
null distribution of Jy is F wiih 10 degrees of freedom in the numerator and 
49 degrees ol trecdom in the denominator, a test using this critical value for 
i has a size of 17.0%. Thus, the asymptotic 5% test has a size of 17.0% in 
a sample of GO months; it rejects the null hypothesis more than three times 
too often. 

Table 5.1 presents this calculation for fj, f», and Jy using 010, 20, and 40 
for values of N and using GO, (20, 180. 240, and 360 for values of T. [tis 
apparent that the finitesample size of the tests is larger than the asymptotic 
size ol 5%. Thus the largesunple tests will reject the null hypothesis too 
often, This problem is severe for the asymptotic tests based on Jj and fo. 
When N = 10 the problem is mostly important for the low values of 7. 


For example, the finite-sample size of a test with an asymptotic size of 576 
is 17.0% and 9.6% for fj; and fy, respectively. As N increases the severity of 
the problem increases. When N = 40 and 7 = 60 the finite-sample size of 
au asymptotic 5% test is 98.5% for fy and 80.5% for b. In these cases, the 
null hypothesis will be rejected most of the time even when it is true. With 
N = 10, the size of a 576 asymptotic test is still overstated considerably even 
when T = 360. 

The asymptotic test with а finitesample adjustment based on Ji per- 
forms much better in finite samples than does its unadjusted counterpart. 
Only in the case of N = 40 and T = 60 is the exact size significantly over- 
stated, This shows that finite-sample adjustments of asymptotic test statistics 
can play an inportant role, 


5.5 Power of Tests 


When drawing inferences using a given test statistic it is important to con- 
sider Us power “The power is the probability that the null hypothesis will 
be rejected given that an alternative hypothesis is true; Low power against 
an interesting alternative suggests that the test is not useful to discriminate 
between tlie alternative and the null hypothesis. On the other hand, if the 
power is high, then the test сап be very informative but it may also reject 
the nall hypothesis against alternatives that are close to the mill in eco- 
nomie terms. In this case a rejection may be due to small, economically 
unimportant deviations from the null. 

To document the power ofa test, itis necessary to specify the alternative 
data-penerating process and the size of the test; The power for a given size 
of test is the probability that the test statistic is greater than the critical value 
under the null hypothesis, given that the alternative hypothesis is true. 

To illustrate the power of tests of the CAPM, we will focus on the test of 
the Sharpe-Lintner version using Ji trom: (5.3.23). The power of this test 
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Table 5.1. Finitesample size of tests of the Sharpe-Lintner CAPM using largesample test 
statistics, | 

| 
-—— —— ———— шы ысы 


N T h j ^ 
10 60 0.170 0.096 0.051 | 
120 0.099 0.070 0.050 ! 
180 0.080 0.062 0.050 | 
240 0.072 0.059 0.050 ' 
360 0.064 0.056 0.050 | 
20) 60 0.462 0.211 0.057 ` 
120 0.200 0.105 0.051 , 
180 0.136 0.082 0.051 
240 0.109 0.073 0.050 | 
360 0.086 0.064 0.050 n 
40 60 0.985 0.805 0.141 
120 0.610 0.275 0.059 
- 180 0.368 0.164 0.053 
240 0.257 0.124 0.052 
360 0.165 0.092 0.051 


The exact finite-sample size is presented for tests with a size of 5% asymptotically. The finite- 
sample size uses the distribution of Д and the relation between jı and the large sample test 
statistics, fy, . and jy. N is the number of dependent portfolios, and T is the number of 
time-series observations. 


should be representative, and it is convenient to document since the exact 
finite-sainple distribution of J; is known under both the null and alternative 
hypotheses. Conditional on the excess return of the market portfolio, for 
the distribution of J, as defined in (5.3.23), we have 


Ji ~ Fun), (5.5.1) 
where à is the noncentrality parameter of ће F distribution and 


257! 
= Т [ zl aa. (5.5.2) 


т 


To specify the distribution of J, under both the null and the alternative 
hypotheses, we need to specify 6, N, and T. 
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| Under the null hypothesis a is zero, so in this case à is zero and we have 
the previous result that the distribution is central F with N and T — N — 1 
degrees of freedom in the numerator and denominator, respectively. Under 
the alternative hypothesis, to specify б we need to condition on a value of 
/e and specify the value of a 7'a. For the value of 42/62, given a 
monthly observation interval, we choose 0.013 which corresponds to an ex 
post annualized mean excess return of 8% and a sample annualized standard 
dcyiation of 20%. 
For the quadratic term a Za, rather than separately specifying a and 
T. we can usc the following result of Gibbons, Ross, and Shanken (1989). ? 


Ref alling that q is the tangency portfolio and that m is the market portfolio, 
welhave 


2 2 
2 H; ГА { y 
| aoa = dA = s- sr (5.5.3) 
! о? а? 1 
i q m 


Using this relation, we need only specify the difference in the squared 
Sharpe ratio for the tangency portfolio and the market portfolio. The tan- 
gency portfolio is for the universe composed of the N included portfolios 
and the market portfolio. We consider four sets of values for the tangency 
portfolio parameters, For all cases the annualized standard deviation of the 
tangency portfolio is set to 16%. The annualized expected excess return 
then takes on four values, 8.5%, 10.2%, 11.6%, and 13.0%. Using an an- 
nualized expected excess return of 8% for the market and an annualized 
standard deviation of 20% for the market's excess return, these four values 
correspond to values of 0.01, 0.02, 0.03, and 0.04 for 5/ T. 

We consider five values for N: 1, 5, 10, 20, and 40. For 7 we consider 
four values—60, 120, 240, and 360—which are chosen to correspond to 5, 
10, 20, and 30 ycars of monthly data. The power is tabulated for a test with 
a size of 5%. The results are presented in Table 5.2. 

Substantial variation in the power of the test for different experimental 
designs and alternatives is apparent in Table 5.2. For a fixed value of N, 
considerable increases in power are possible with larger values of 7. For 
example, under alternative 2 for N equal to 10, the power increases from 
0.082 to 0.380 as T' increases froin 60 to 360. 

The power gain is substantial when N is reduced for a fixed alternative. 
For example, under alternative 3, for T equal to 120, the power increases 
from 0.093 to 0.475 as N decreases from 40 to 1. However, such gains would 
not be feasible in practice. As N is reduced, the Sharpe ratio of the tangency 
portfolio (and the noncentrality parameter of the F distribution) will decline 
unless the portfolios are combined in proportion to their weightings in that 
portfolio. The choice of N which maximizes the power will depend on the 


We discuss this result further in Chapter 6. 
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Table 5. 2. Power of F-test of Sharpe-Lintner CAPM using Matistic J. 


Ago Ach Nalo Nea Е 


Alternative 1: р, = 8.5% a, = 1676 


T == 60 0.117 0.075 0.065 0.059 0.053 
T= 120 0.191 0.106 0.086 0.072 0.002 
T = 240 0.341 0.178 0.134 0.103 0.082 
T = 360 0.480 0.259 0.190 0.139 0.105 
Alternative 2: jt, = 10.2% a, = 16% 
T= 60 0.189 0.103 0.082 0.068 0.057 
Ts 120 0,339 0.174 0.130 0.098 0.077 
T = 240 0.597 0.340 0.247 0.174 0.124 
T = 300 0,770 0.508 0.380 0.207 0.183 
Alternative 3: Jt, = 11.0% a, = 16% 
T= 60 0.262 0.134 0.101 0.078 0.061 
T = 120 0.475 0.251 0.180 0.128 0.003 
T 240 0.709 0.504 0.374 0.261 0.175 
T= 300 0.908 0.711 0.570 0.416 0.280 
Alternative 4: % = 13.0% a, = 16% 
T = 60 0.334 0.167 0.121 0.089 0.065 
T= 120 0.593 0.332 0.237 0.163 0.110 
T = 240 0.873 0.647 0.502 0.356 0.234 
T = 360 0.965 0.845 0.720 0.563 0.389 


The alternative hypothesis is characterized by the value of the expected excess тешти and 
the value of the standard deviation of the tangency Bente. The tingeney portfolio is with 
respect to the & included portfolios aud the market poi. j4, is the expected excess return 
of the rangeney portfolio, and o, is che annualized standard deviation of the excess return of 
the tangency portfolio. The market portfolio is assumed to have an expected excess venim of 
8.0% and a standard deviation of 20%, Under the null hypothesis the market portfolio is the 
tangency portfolio, & is the number of portfolios included in the test and 7 is the number of 
months of data included. 


rate at which the Sharpe ratio of the tangency portfolio declines as assets 
arc grouped together. : 

While we do not have general results about the optimal design of a mul- 
tivariate test, we can draw some insights from this power analysis. Increasing 
the length of the time series can lead to a significant payoſf in terms of power. 
Further, the power is very sensitive to the value of N. The analysis suggests 
that the value of N should be kept small, perhaps no larger than about ten. 
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5.6 Nonnormal and Non-IID Returns 


In this section we are concerned with inferences when there are deviations 
from the assumption that returns are jointly normal and HD through time. 
We consider tests which accommodate nonmormality, heteroskedasticity, 
and temporal dependence of returns. Such tests are of interest for two rea- 
sons, First, while the normality assumption is sufficient, it is not necessary to 
derive the CAPM as a theoretical model. Rather, the normality assumption 
is adopted for statistical purposes. Without this assumption, finite-sample 
properties of asset pricing model tests are difficult to derive, Second, depar- 
tures of monthly seenrity returns from normality have been documented, 
There is also abundant evidence of heteroskedasticity and temporal depen- 
dence in stock returns.“ Even though temporal dependence makes the 
CAPM unlikely to hold as an exact theoretical model, it is still of interest to 
examine the empirical performance of the model. It is therefore desirable 
to consider the effects of relaxing these statistical assumptions. 

Robust tests of the CAPM can be constructed using a Generalized 
Method of Moments (GMM) framework. We focus on tests of the Sharpe- 
Lintner version; however, robust tests of die Black version can be constructed 
in thesame manner Within the GMM framework, the distribution of returns 
conditional on the market retum can be both serially dependent and con- 
ditionally heteroskedastic. We need only assume that excess asset returns 
are stationary and ergodic with finite fourth moments. The subsequent 
analysis draws on Section A.2 of the Appendix which contains a. general 
development of the GMM methodology. We continue with a sample of T 
ime-series observations and N assets, Following the Appendix, we need to 
setup the vector of moment conditions with zero expectation. The required 
moment conditions follow from the excessereturn market model. The resid- 
ual vector provides N moment conditions, and the product of the excess 
return of the market and the residual vector provides another N moment 
conditions. Using the notation of the Appendix, for f.) we have 


f(0) = h, Ge,. (5.0.1) 


where hy = [1 Zul. €i Zi a 3 Zm and = [ac B']. 

The specification of the excess-renim market model implies the mo- 
ment condition Ef, (%)] = 0, where Oy is the true parameter vector. This 
moment condition [oris the basis for estimation and testing using a GMM 
approach, GMM chooses the estimator so that linear combinations of the 
sample average of this moment condition are zero. For the sample average, 


See Fauna (1065, 1976), Wlatiberp and Gonedes (0974), Alfleck-Graves and McDonald 
(VOR), and Table LE in Chapter J. 


Nee Chapters 2 aud. A, and the references given in those chapters. 
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we have 


1 T 
gr = I М). 


The GMM estimator @ is chosen to minimize the quadratic form 


Q;(0) = gr(0)Wgr(0), (5.6.3) 


where Wis a positive definite (2N x2N) weighting matrix. Since in this case 
we have 2N mofnent condition equations and 2N unknown parameters, the 
system is exactly identified and Û can be chosen to set the average of the 
sample moments gr(0) equal to zero. The GMM estimator will not depend 
on W since Q1(0) will attain its minimum of zero for any weighting matrix. 
The estimators from this GMM procedure are equivalent to the maximum 
likelihood estimators in (5.3.13) and (5.3.14). The estimators are 


à e h (5.6.4) 
Dies (Zi Bn — Ân) 
Xin ~ Ên)? 
The importance of the GMM approach for this application is that a 


robust covariance matrix of the estimators can be formed. The variances 
of & and will differ from the variances in the maximum likelihood ap- 


proach. The covariance matrix of the GMM estimator @ follows from equa- 
tion (A.2.8) in the Appendix. It is 


B (5.6.5) 


У = [DSy!Do]^!, (5.6.6) 
where 8 5 ; 
ügr(0) ! 
-E|——— 6. 
Do | 30 | (5.6.7) 
and \ 
+оо ; i 
S, = у, Flf:(6)f-1(8) ). (5.6,8) 
i=- 
The asymptotic distribution of Û is normal. Thus we have * | 
As |н 
8 A N (o. mss mar). (5.6,9) 


The application of the distributional result in (5.6.9) requires consistent 
+ " è П 
estimators of Dy and So since they are unknown. In this case, for Dp we have 


=|! 2 
Do pd Ё m (c? + " ® In (5.6.10) 
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A consistent estimator Dy can easily be constructed using the maximum like- 
lihood estimators of 4, and o2. To compute a consistent estimator of So. an 
assumption is necessary to reduce the summation in (5.6.8) to a finite nmn- 
ber of terms. Section A.3 in the Appendix discusses possible assumptions. 
Defining Sr as a consistent estimator of So, (1/ 70 [DSJ Dr]: is a consis- 
tent estimator of the covariance matrix of Ô. Noting that à — RÓ where 
R = (1 0) ® In, a robust estimator of Var(&) is (/ T)R[ID,S;D;]^'R.. 
Using tliis wc can construct a chi-squarc test of the Sharpe-Lintner model 
as in (5.3.22). The test statistic is 


; -1 
h= Ta [RID,S7'Dr17'R'] (5.6.11) 
Under the null hypothesis o = 0, 


h~ XN. (5.6.19) 


MacKinlay and Richardson (1991) illustrate the bias in standard CAPM 
test statistics that can result from violations of the standard distributional 
assumptions. Specifically, they consider the case of contemporancous condi- 
tional heteroskedasticity. With contemporancous conditional heteroskedas- 
ticity, the variance of the markct-model residuals of equation (5.3.3) de- 
pends on the contemporancous market return. In their example, the as- 
sumption that excess returns are IID and jointly multivariate Student f leads 
to donditional heteroskedasticity The multivariate Student / assumption 
for excess returns can be motivated both empirically and theoretically. One 
empirical stylized fact from the distribution of returns literature is that re- 
turns have fatter tails and are more peaked than one would expect from a 
normal distribution. This is consistent with returns coming from a multi- 
varihte Student t. Further, the multivariate Student ( is a return distribution 
for Which mcan-variance analysis is consistent with expected utility maxi- 
mization, making the choice theorctícally appealing.” 
he bias in the size of the standard CAPM test for the Student / case 
depends on the Sharpe ratio of the market portfolio and the degrees of 
freetlom of the Student 4. MacKinlay and Richardson (1991) present some 
estimates of the potential bias for various Sharpe ratios and for Student / 
degrees of freedom equal to 5 and I0. They find that in general the bias is 
small, but if the Sharpe ratio is high and the degrees of freedom small, the 
bias can be substantial and lead to incorrect inferences. Calculation of the 
test statistic /; based ou the GMM framework provides a sunple check for tlie 
possibility that the rejection of the model is che result of heteroskedasticity 
in the data. 


“See Ingersoll (1987), p. 104. 
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5.7 Implementation of Tests 


In this section we consider issues relating to empirical implementation of 
the test methodology. A summary of empirical results, an illustrative imple- 


mentation, and discussion of the observability of the market portfolio are 
included. 


2.7.1 Summary of Empirical Evidence 


Án enormous amount of literature presenting empirical evidence on the 
CAPM has evolved since the development of the model in the 1960s. The 
early evidence was largely positive, with Black, Jensen, and Scholes (1972), 
Fama and MacBeth (1973), and Blume and Friend (1973) all reporting evi- 
dence consistent with the mean-variance efficiency of the market portfolio, 
There was some evidence against the Sharpe-Lintner version of the CAPM 
as the estimated mean return on the zero-beta portfolio was higher than the 
riskfree return, but this could be accounted for by the Black version of the 
medel. 

In the late 1970s less favorable evidence for the CAPM began to appear 
in the so-called anomalies literature. In the context of the tests discussed in 
this chapter, the anomalies can be thought of as firm characteristics which 
can be used to group assets together so that the tangency portfolio of the 
included portfolios has a high ex post Sharpe ratio relative to the Sharpe ratio 
of the market proxy. Alternatively, contrary to the prediction of the CAPM, 
the firm characteristics provide explanatory power for the cross section of 
sainple mean returns beyond the beta of the CAPM. 

Early anomalies included the price-carnings-ratio effect and the size 
effect. Basu (1977) first reported the price-carnings-ratio effect. Basu's 
finding is that the market portfolio appears not to be mean-variance efficient 
relative to portfolios formed on the basis of the price-carnings ratios of 
firms. Firms with low price-carnings ratios have higher sample returns, and 
firms with high price-carnings ratios have lower mean returns than would 
be the case if the market portfolio was mean-variance efficient. The size 
effect, which was first documented by Banz (1981), is the result that low 
market capitalization firms have higher sample mean returns than would 
be expected if the market portfolio was mean-variance efficient. These two 
anomalies are at least partially related, as the low price-carnings-ratio firms 
tend to be small, 

A number of other anomalies have been discovered more recently. 
Каша and French (1992, 1993) find that beta cannot explain the differ- 
ence in return between portfolios formed on the basis of the ratio of book 
value of equity to market value of equity. Firms with high book-market ra- 
tios have higher average returns than is predicted by the CAPM. Similarly, 
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DeBondt and Thaler (1985) and Jegadeesh and Titman (1995) find that 
a portfolio formed by buying stocks whose value has declined in the past 
(losers) and selling stocks whose value has risen in the past (winners) has a 
higher average return than the CAPM predicts. Fama (1991) provides a 
good discussion of these and other anomalies, 

Although the results in the anomalies literature may signal economically 
important deviations from the CAPM, there is little theoretical motivation 
for the firm characteristics studied in this literature, This opens up the 
possibility that the evidence against the CAPM is overstated because of data- 
snooping and sample selection biases. We briclly discuss these possibilities. 

Data-snooping biases refer to the biases in statistical inference that result 
froni using information from data to guide subsequent research with the 
same or related data, These biases are almost impossible to avoid due to 
the nonexperimental nature of economics. We do not have the luxury of 
running another experiment to create a new data set. Lo and MacRinlav 
(19900) illustrate the potential magnitude of data-snooping biases in a test of 
the Sharpe-Lintner version of the CAPM. They consider the case where the 
characteristic used to group stocks into portfolios (e.g. size or price-carninys 
ratio) is selected not from theory but from previous observations of mean 
stock returns using related data, € ‘omparisons of the null distribution of the 
lest statistic with and without datiesiooping suggests that the magnitude of 
the biases can be immense. However, in practice, it is difficult to specify the 
adjustment that should be made lor data-snooping. Thus, the main message 
is a warning that the biases should at least be considered as а potential 
explanation for model deviations. 

Sample selection biases can arise when data availability leads to ccrtain 
subsets of stocks being excluded from the analysis. For example, Kothari, 
Shanken, aid Sloan (1995) argue that data requirements for studies looking 
at bookanarket ratios lead to failing stocks being excluded and a resulting 
survivorship bias, Since the failing stocks would be expected to have low 
returns and high book-market ratios, the average return of the included high 
book-market-ratio stocks would have an upward bias. Kothari, Shanken, and 
Sloan (1995) argue that this bias is largely responsible for the previously cited 
result of гапа and French (1992, 1003), However, the importance of this 
particular survivorship bias is not fully resolved as Fama and French (1996h) 
dispute the conclusions of Kothari, Shanken, and Sloan. 1n апу event, itis 


Clear that researchers should be aware of the potential problems that can 
arise from sample selection biases, 


3.7.2 Hlustiative Implementation 


We present testsof the Shar peLintner model to illustrate the testing method- 
ology. We consider four test statistics: A from (5.3.23), J trom (5.3.33), h 
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from (5.3.41), and р from (5.6.11). The tests are conducted using a thirty- 
year sample of monthly returns on ten portfolios. Stocks listed on the'New 
York Stock Exchange and on the American Stock Exchange are allocated to 
the portfolios based on the market value of equity and are value-weighted 
within the portfolios. The CRSP value-weighted index is used as a proxy 
for the market portfolio, and the one-month US Treasury bill retürn is used 
for the riskfree return. The sample extends from January 1965 through 
December 1994. 

Tests are conducted for the overall period, three ten-year subperiods, 
and six five-year subperiods. The subperiods are also used to form overall 
aggregate test statistics by assuming that the subperiod statistics are indepen- 
dent. The aggregate statistics for J, Js, and р are the sum of the individual 
statistics. The distribution of the sum under the null hypothesis will be chi- 
square with degrees of freedom equal to the number of subperiods times 
the degrees of freedom for each subperiod. The aggregate statistic for J; 
is calculated by-scaling and summing the F statistics. The scale factor is 
calculated by approximating the F distribution with a scaled chi-square dis- 
tribution, The approximation matches the first two moments. The degrees 
of freedom of the null distribution of the scaled sum of the subperiod /'$ 
is the number of subperiods times the degrees of freedom of the chi-square 
approximation. 

The empirical results are reported in Table 5.3. The results present 
evidence against the Sharpe-Lintner CAPM. Using ji, the p-value for the 
overall thirty-year period is 0.020, indicating that the null hypothesis is re- 
jected at the 5% significance level. The five- and ten-year subperiod results 
suggest that the strongest evidence against the restrictions imposed by the 


model is in the first ten years of the sample from January 1965 to December 
1974. 


Comparisons of the results across test statistics reveal that in finite sam- 
ples inferences can differ. A comparison of the results for J versus f 
illustrates the previously discussed fact that the asymptotic likelihood ratio 
test tends to reject too often. The finite-sample adjustment to f) works well 
as inferences with A are almost identical to those with Jı. 


5.7.3 Unobservability of the Market Portfolio 


Iu tlie preceding analysis, we have not addressed the problem that the réturn 
on the market portfolio is unobserved and a proxy is used in the tests. Most 
tests usc a value- or equal-weighted basket of NYSE and AMEX stocks as the 
market proxy, whereas theoretically the market portfolio contains all assets. 
Roll (1977) emphasizes that tests of the CAPM really only reject the mean- 
variance efficiency of the proxy and that the model might not be rejected if 
tlie return on the true market portfolio were used. Several pend aie 
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Table 5.3. Empirical results for tests of the Sharpe-Lintner version of the CAPM. 


Time h рше fh value До раце fh fevalue 


Five-year subperiods 

1/65-12/69 2.038 0.049 20.867 0.022 18.432 0.048 22.105 0.015 
1/70-12/74 2.136 0.039 21.712 0.017 19.179 0.038 21.397 0.018 
1/75-12/79 1.914 0.066 19.784 0.031 17.476 0.064 27.922 0.002 
1/80-12/84 1.224 0.300 13.378 0.203 11.818 0.297 13.066 0.220 
1/85-12/89 1.732 0.100 18.164 0.052 16.045 0.098 16.915 0.076 
1/90-12/94 1.153 0.344 12.680 0.242 11.200 0.342 12.379 0.200 

Overall 77.224 0.004 106.586 * 94.151 0.003 113.785 ** 


en yeur sub periods 


— — — — <DD> 


1/65-12/74 2.400 0.013 23.883 0.008 22.490 0.013 24.649 0.006 

1/75-12/84 2.248 0.020 22.503 0.013 21.190 0.020 27.192 0.002 

1/85-12/94 1.900 0.053 19.281 0.037 18.157 0.052 16.373 0.089 
Overall 57.690 0.001 65.667 * 61.837 0.000 68.215 ** 


irtyyear period 
1/65-12/94 2.159 0.020 21.612 0.017 21.192 0.020 22.176 0.014 


үз than 0.0005. 


Results are for ten value-weighted portfolios (N = 10) with stocks assigned to the ро ойох 
hased on market value of equity, The CRSP value-weighted index is used as a measure of the 
market portfolio and a one-month Treasury bill is used as a measure of the riskfree rate. The 
tests are based on monthly data from January 1965 to December 1994. 


been suggested to consider if inferences are sensitive to the use of a proxy 
in place of the market portfolio. 

One approach is advanced in Stambaugh (1982). He examines the 
sensitivity of tests to the exclusion of assets by considering a number of 
broader proxies for the market portfolio.“ He shows that inferences are 
similar whether one uses a stock-based proxy, a stock- and bond-based proxy, 
or a stock-, bond-, and real-estate-based proxy. This suggests that inferences 
arc not sensitive to the error in the proxy when viewed as a measure of the 
market portfolio and thus Roll's concern is not an empirical problem. 


"Related work considers the possibility of accounting for the return on human capital. Sec 
Mayers (1972), Campbell (19962), and Jagannathan and Wang (1996). 
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А second approach to the problem is presented by Капае and Stam- 
baugh (1987) and Shanken (19872). Their papers estimate an upper bound 
on the correlation between the market proxy return and the true market 
return necessary to overturn the rejection of the CAPM. The basic finding is 
that if the correlation between the proxy and the true market exceeds about 
0.70, then the rejection of the CAPM with a market proxy would also imply 
the rejection of the CAPM with the true market portfolio. Thus, as long as 
we believe there is a high correlation between the true market return and 
the proxies used, the rejections remain intact. 


5.8 Cross-Sectional Regressions 


So far in this chapter we have focused on the mean-variance efficiency of 
the market portfolio. Another view of the CAPM is that it implies a lin- 
car relation between expected returns and market betas which completely 
explain the cross section of expected returns. These implications can be 
tested using a cross-sectional regression methodology. 

Fama and MacBeth (1973) first developed the cross-sectional regression 
approach. The basic idea is, for each cross section, to project the returns on 
the betas and then aggregate the estimates in the time dimension. Assuming 
that the betas are known, the regression model for the {th cross section of 
N assets is 

ZL = oyuto Yu By, +. (5.8.1) 
where Z, is the (N x Û) vector of excess asset returns for time period f, H is an 
(N x 1) vector of ones, and H, is the (N x1) vector of CAPM betas. 

Implementation of the Fama-MacBeth approach involves two steps. 
First, given T periods of data, (5.8.1) is estimated using OLS for each £, 
{= 1,.... T, giving the T estimates of yo, and у. Then in the second 
step, the time series of yo's and %s are analyzed. Defining yo = уо) 
and y, = Дуц], the implications of the Sharpe-Lintner CAPM are w = 0 
(zero intercept) and yı > 0 (positive market risk premium). Because the 
returns are normally distributed and temporally HD, the gammas will also 
be normally distributed and IID. Hence, given time series of yor and Vie, 


=1,..., T, we can test these implications using the usual Gtest. Defining 
ауу) as the statistic, we have 


w(y,) = 57 (5.8.2) 
n 
where 
к Dv. ka А 
ў = $2 (5.8.3) 
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and n 

1 Fo АЯ (КА) 

аз = س‎ — ýy. 5.8. 
» 171) — Yn Y 


The distribution oC 10%) is Student ¢ with (7—1) degrees ot freedom and 
asymptotically is stundard normal. Given the test statistics, inferences can 
be made in the usual fashion. 

The Fama-MacBeth approach is particularly useful because it can easily 
be modified to accommodate additional risk measures beyond the CAPM 
beta. By adding additional risk measures, we can examine the hypothe: 
sis that beta completely describes the cross-sectional variation in expected 
returns. For example, we can consider if firm size has explanatory power 
for the cross-section of expected returns where firm size is defined as the 
logarithm of the niu ket value of equity, Defining €, as the (Nx 1) vector 
with elements corresponding to firm size at the beginning of period 1, we 
can augment (5.8.1) 10 investigate if firm size has explanatory power not 
captured by the market beta: 


A= yut t Yu, t yu Sit 14. (5.8.5) 


Using the yas from: (5.8.5), we can test the hypothesis that size docs not 
have any explanatory power bevond beta, that is, yo = 0, by setting j= 2 in 
(5.8.2) - (5.8.1). 

The FainaMacleth methodology, while useful, does have several prob- 
lems. First, it cannot be directly applied because the market betas are not 
known. Thus the regressions are conducted using betas estimated from the 
data, which introduces an errors-in-variables complication, The errorsan- 
variables problem can be addressed iu two ways. One approach, adopted 
by Fama and MacBeth, is to minimize the errors-in-variables problem bv 
grouping the stocks into portfolios and increasing the precision of the 
beta estimates. A second approach, developed by Litzenberger and Ra- 
maswamy (1979) and refined by Shanken (1999b), is to explicitly adjust 
the standard errors to correct for the biases introduced by the errors-in- 
variables. Shanken suggests multiplying a: in (5.8.4) by an adjustment 
factor (I ( = Pu)" /62), While this approach eliminates the errors-in- 
variables bias in the statistic in (5.8.2), it does not eliminate the possibility 
that other variables might enter spurtously in (5.8.5) as à result of the un- 
observability of the truce betas, 

The unobservability of the market portfolio is also a potential problem 
for the cross-sectional regression approach, Roll and Ross (1994) show that 
ifthe true market portfolio is efficient, the cross-sectional relation between 
expected returns and betas сап be very sensitive to even small deviations of 
the market pordolio proxy from the true market portfolio. Thus evidence 
of the lack o£ a relation between expected return and beta could be the 
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result of the fact that empirical work is forced to work with proxies for the 
market portfolio. Kandel and Stambaugh (1995) show that this extreme 
sensitivity can potentially be mitigated by using a generalized-least-squares 
(GLS) estimation approach in place of ordinary least squares. However their 
result depends on knowing the true covariance matrix of returns. The gains 
from using GLS with an estimated covariance matrix are as yet uncertain. 


5.9 Conclusion 


In this chapter we have concentrated on the classical approach to testing 
the unconditional CAPM. Other lines of research are also of interest. One 
important topic is the extension of the framework to test conditional versions 
of the CAPM, in which the model holds conditional on state variables that 
describe the state of the economy. This is useful because the CAPM can hald 
couditionally, period by period, and yet not hold unconditionally. Chapter 8 
discusses the circumstances under which the conditional CAPM might hold 
in a dynamic equilibrium setting, and Chapter 12 discusses econometric 
methods for testing the conditional CAPM. | 

Another important subject is Bayesian analysis of mean-variance effi- 
ciency and the CAPM. Bayesian analysis allows the introduction of prior 
information and addresses some of the shortcomings of the classical ap- 
proach such as the stark dichotomy between acceptance and rejection of 
the model. Harvey and Zhou (1990), Kandel, McCulloch, and Stambaugh 
(1995), and Shanken (1987c) are examples of work with this perspective. 

We have shown that there is some statistical evidence against the CAPM 
in the past 30 years of US stock-market data. Despite this evidence, the 
CAPM remains‘a widely used tool in finance. There is controversy about 
how the evidence against the model should be interpreted. Some auth 
argue that the CAPM should be replaced by multifactor models with several 
sources of risk; others argue that the evidence against the CAPM is overstated 
because of mismeasurement of the market portfolio, improper neglect of 
conditioning information, data-snooping, or sample-selection bias; and yet 
others claim that no risk-based model can explain the anomalies of stock- 
market behavior. In the next chapter we explore multifactor asset pricing 
models and then return to this debate in Section 6.6. 


Problems—Chapter 5 


5.1 Result 5 states that for a multiple regression of the return on any asset 
or portfolio Ft, on the return of any minimum variance portfolio R, (except 
for the global minimum-variance portfolio) and the return of its associated 
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zdro-beta portfolio Rop, Ra = Bot fli К+ Bey +e, the regression coefficients 
айс By = Bap, By = 1 — Bap, and Во = 0. Show this. 


5 2 Show that the intercept of the excess-return market model, о, is zero 
if the market portfolio is the tangency portfolio. 


5.5 Using monthly returns from the 10-year period January 1985 to De- 
cgmber 1994 for three individual stocks of your choice, a value-weighted 
ae index, and a Treasury bill with one month to maturity, perform the 
following tests of the Sharpe-Lintner Capital Asset Pricing Model. 


5.3.1 Using the entire 10-year sample, regress excess returns of cach 
stock on the excess (value-weighted) market return, and perform tests 
with a size of 5% that the intercept is zero. Report the point estimates, 
statistics, and whether or not you reject the CAPM. Perform regression 
diagnostics to check your specification. 


p. 3.2 For cach stock, perform the same test over cach of the two equi- 
partitioned subsamples and report the point estimates, statistics, and 
whether or not you reject the CAPM in each subperiod. Also include the 
same diagnostics as above. 


5.3.3 Combine all three stocks into a single equal-weighted portfolio 
and re-do the tests for the entire sample and for each of the two subsam- 
ples, and report the point estimates, (statistics, aud whether or not you 
reject the CAPM for the whole sample and in cach subsample. Include 
diagnostics. 


5.3.4 “Jointly test that the intercepts for all three stocks are zero using the 
F-test statistic J; in (5.3.23) for the whole sample and for cach subsample. 


54 Derive the Gibbons, Ross, and Shanken result in equation (5.5.3). 


Multifactor Pricing Models 


AT THE END OF CHAPTER 5 we summarized empirical evidence indicating 
that the CAPM beta does not completely explain the cross section of ex- 
pected asset returns. This evidence suggests that one or more additional 
factors may be required to characterize dic behavior of expected returns and 
naturally leads to consideration of multifactor pricing models. Theoretical 
arguments also suggest that more than опе factor is required, since only 
under strong assumptions will the CAPM apply period by period. Two main 
theoretical approaches exist. The Arbitrage Pricing Theory (APT) devel- 
oped by Ross (1976) is based on arbitrage arguments and the Intertemporal 
Capital Asset Pricing Model (ICAPM) developed by Merton (1973a) is based 
on equilibrium arguments. In this chapter we will consider the econometric 
analysis of multifactor models. 

The chapter proceeds as follows. Section 6.1 briefly discusses the the- 
oretical background of the multifactor approaches. In Section 6.2 we con- 
sider estimation and testing of the models with known factors, while in 
Section 6.3 we develop estimators for risk premia and expected returns, 
Since the factors are not always provided by theory, we discuss ways to con- 
struct them in Section 6.4. Section 6.5 presents empirical results. Because 
of the lack of specificity of the models, deviations can always be explained 
by additional factors. This raises an issue of interpreting model violations 
which we discuss in Section 6.6. 


6.1 Theoretical Background 


The Arbitrage Pricing Theory (APT) was introduced by Ross (1976) as an 
alternative to the Capital Asset Pricing Model. The APT can be more gen- 
eral than the CAPM in that it allows for multiple risk factors. Also, unlike 
the CAPM, the APT does not require the identification of the market port- 
folio. Howcver, this generality is not without costs. In its most general form 
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the APP provides an approximate relation for expected asset returns with 
an unknown number of unidentified factors, At this level rejection of the 
theory is impossible (unless arbitrage opportunities exist) and as a conse- 
quence testability of the model depends on the introduction of additional 
assumptions, ! 

The Arbitrage Pricing Theory assumes that markets are competitive and 
frictionless and that the return generating process for asset returns being 
considered is 


R = atbfte, (G.L.1) 
Ke 1 f] = 0 (6.1.2) 
Hl = a? < a? < оо, (6.1.3) 


where R, is the return for asset 7, a; is the intercept of the factor model, 
b, is a (Kx 1) vector of factor sensitivities for asset i, f is a (X x1) vector of 
common factor realizations, and e, is the disturbance term. For the svstem 
of N assets, 


R = a+Bft+e (6.1.4) 
Ele | = 0 (6.1.5) 
Klee 1 f] = X. (6.1.6) 


Iu the system equation, R is an (N x D) vector with R = [Ry h. Py}, ais 
an (NX I) vector with a = [ai à» ау], B is an (Nx А) matrix with B = 
[bi bo . be“, and € isan (Nx Û) vector with € = [ej e» ‘°° ey. We further 
assume that the factors account for the common variation in asset returns 
so that the disturbance term for large well-diversified portfolios vanishes.” 
This requires that the disturbance terms be sufficiently uncorrelated across 
assets, 

Given this structure, Ross (1976) shows that the abseuce of arbitrage in 
large economies implies that 


po thy Bx. (6.1.7) 


where tis the (Nx D expected return vector, Ay is the model zero-beta pa 
rameter and is equal to the riskiree return if such an asset exists, and Ag 
isa (Nx D) vector of Factor risk premia. Here, and throughout the chapter, 


Phere las been substantial debate on the testability of the APT. Shanken (1982) and 
Dybvig and Ross ORI) provide onc interesting exchange. Dhiymes, Friend, Gultekin, and 
Gultekin CORD alo question the empirical relevance of the model, 

7A targe well-divessificd ро обоа por folio with а large number ofstorks with weightings 
of orders. 
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let represent a conforming vector of ones. The relation in (6.1.7) is apse 
proximate as a finite number of assets can be arbitrarily mispriced. Because: 
(6.1.7) is only an approximation, it does not produce directly testable restric- . 
tions for asset returns. To obtain restrictions we need to impose additional . 
structure so that the approximation becomes exact, З : Ree 

Connor (1984) presents a competitive equilibrium version of the APT ue 
which has exact factor pricing asa feature. In Connor's model the additional n 
requirements are that the market portfolio be well-diversified and that the E 
factors be pervasive. The market portfolio will be well-diversified if no single ^: E. 
asset in the economy accounts for a significant proportion of aggregate 
wealth. The requirement that the factors be pervasive permits investors io 
diversify away idiosyncratic risk without restricting their choice of factor risk 
exposure. 

Dybvig (1985) and Grinblatt and Titman (1985) take a different ap- 
proach. "They investigate the potential magnitudes of the deviations from 
exact factor pricing given structure on the preferences of a representative 
agent. Doth papers conclude that given a reasonable specification of the 
parameters of the economy, theoretical deviations from exact factor pricing 
are likely to be negligible. As a consequence empirical work based on the 
exact pricing relation is justified, . 

Exact factor pricing can also be derived in an intertemporal asset pricing 
framework. The Intertemporat Capital Asset Pricing Model developed in 
Merton (1973a) combined with assumptions on the conditional distribution 
of returns delivers a multifactor model. In this model, the market portfolio 
serves as one factor and state variables serve as additional factors! The 
additional factors arise from investors’ demand to hedge uncertainty about 
future investment opportunities. Breeden (1979), Campbell (19932, 1996), 
and Fama (1993) explore this model, and we discuss it in Chapter 8. . 

In this chapter, we will generally not differentiate the APT from the 
ICAPM. We will analyze models where we have exact factor pricing, 1 is, 


i 
H = 1А + BAg. (6.1.8) 


There is some flexibility in the specification of the factors. Most empiri- 
cal implementations choose a proxy for the market portfolio as one factor. 
However, difTerent techniques are available for handling the additional fac- 
tors, We will consider several cases. In one case, the factors of the AP¥ and 
the state variables of the ICAPM need not be traded portfolios. In other 
cases the factors are returns on portfolios, These factor portfolios are called 
mimicking portfolios because jointly they are maximally correlated with the 
factors. Exact factor pricing will hold with such portfolios. Huberman, 
Kandel, and Stambaugh (1987) and Breeden (1979) discuss this issue in 
the context of the APT and ICAPM, respectively. 
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6.2 Estimation and Testing 


In this section we consider the estimation and testing of various forms of the 
exict factor pricing relation. The starting point for the econometric analysis 
of the model is an assumption about the time-series behavior of returns. 
We will assume that returns conditional on the factor realizations are HD 
thfough time and jointly multivariate normal. This is a strong assumption, 
but it does allow for limited dependence in returns through the time-series 
behavior of the factors. Furthermore, this assumption can be relaxed by 
casting the estimation and testing problem in a Generalized Method of 
Moments framework as outlined in the Appendix. The GMM approach for 
multifactor models is just a gencralization of the GMM approach to testing 
the CAPM presented in Chapter 5. | 

As previously mentioned, the multifactor models specify neither the 
number of factors nor the identification of the factors. Thus to estimate and 
test the model we need to determine the factors—an issue we will address in 
Section 6.4. In this section we will proceed by taking the number of factors 
and their identification as given. 

We consider four versions of the exact factor pricing model: (1) Fac- 
tors are portfolios of traded assets and a riskfrec asset exists; (2) Factors are 
portfolios of traded assets and there is nota riskfree asset; (3) Factors are 
not portfolios of traded assets; and (4) Factors arc portfolios of traded assets 
and the factor portfolios span the mean-variance frontier of risky assets. We 
use maximum likelihood estimation to handle all four cases. See Shanken 
(1992b) for a treatment of the same four cases using a cross-sectional re- 
gression approach, 

Given the joint normality assumption for the returns conditional on the 
factors, we can construct a test of any of the four cases using the likelihood 
ratio, Since derivation of the test statistic parallels the derivation of the 
likelihood ratio test of the CAPM presented in Chapter 5, we will not repeat 
it here. The likelihood ratio test statistic for all cases takes the same general 
form. Defining J as the test statistic we have 


N E um 
J=- (r- z7 К – ) Пор | - log] |), (6.2.1) 


where È and È? are the maximum likelihood estimators of the residual 
808 lee matrix for the unconstrained model and constrained model, 
jeet lively T is the number of time-series observations, N is the number 
of ihcluded portfolios, and K is the number of factors. As discussed in 
Chapter 5, the statistic has been scaled by (7 — N _ K — 1) rather than the 


2 
usual T to improve the convergence of the finite-sample null distribution 
! 
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to the large siunple distribution ‘The large sample distribution of J under 
the null hypothesis will be chi-square with the degrees of freedom equal to 
the number of restrictions imposed by the null hypothesis. 


6.2.1 Portfolios as Factors with a hee Asset 


We first consider the case where the factors are traded portfolios and there 
exists а riskfree asset. The unconstrained model will be a K-factor model 
expressed in excess returns. Define Z, as an (N x 1) vector of excess returns 
for N assets (or portfolios of assets). For excess returns, the K-factor linear 
model is: 


Z = a+ Вк, t c, (6.2.2) 
Ele] = 0 (6.2.3) 

Klee] = X (6.2.4) 

E[(Zxi] = ик. Ек — Hp) (Zk = eg) = Өк (6.2.5) 
Cov[Zx,. E,] = O. (0.2.0) 


Bis the (Nx A) matrix of factor sensitivities, Zr is the (A x 1) vector of factor 
portfolio excess returns, and a and €, are (N x1) vectors of asset return in- 
tercepts and disturbances, respectively. X is the variance-covariance matrix 
of the disturbances, and Оқ is the variance-covariance matrix of the factor 
pordolio excess returns, while O is a (Kx N) matrix of zeroes, Exact factor 
pricing implies that the elements of the vector a in (6.2.2) will be zero. 

For the unconstrained model in (6.2.2) the maximum likelihood csti- 
mators are just the OLS estimators: 


a = -B (6.2.7) 
n T T x 
B= yz, = Bx, =, йк)' e = Ёк) (Zep — Ёк) (6.2.8) 
t=1 1=1 

x ME a 52 ma" 

Ў = ＋ 2 —& - BZ (Z. - à — Zp)’, (6.2.9) 
where 
1 12 
Ё = F 32 and Ду = T D, 
tat i=} 


“See equation (5:25:40) and Jobson and Korkie (1982). 
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For the constrained model, with a constrained to be zero, the maximum 
likelihood estimators are 


1 T d 

BC | LZ 6, (6.9.10) 
fad = 

2, 1 8 д 

= + L. Вк, B-. (6.9.11) 
i=! 


The null hypothesis a equals zero can be tested using the likelihood ratio 
statistic J in (6.2.1). Under the null hypothesis the degrees of freedom of the 
null distribution will be N since the null hypothesis imposes N restrictions, 
In this case we can also construct an exact multivariate Fest of the null 
hypothesis. Defining Ji as the test statistic we have 
n ; А 
е sen (s nee O (6.9.19) 


A 


where Qg is the maximum likelihood estimator of Qg, 


П 
f = 7 Ук = бк) ко Jig). (6.2.13) 
t=] 
Under the null hypothesis, Л is unconditionally distributed central P with N 
degrees of freedom in the numerator and (7 — N — К) degrees of freedom 
in the denominator, ‘This test can be very useful since it can eliminate the 
problems that can accompany the use of asymptotic distribution theory. 
Jobson and Rorkie (1985) provide a derivation of Д. 


6.2.2 Portfolios as Factors without a Riskfree Asset 


In the absence of a riskfree asset, there is a zero-beta model that is a multi- 
factor equivalent o£ the Black version of the CAPM. In a multifactor context, 
the zero-beta portfolio is a portfolio with no sensitivity to any of the factors, 
and expected returns in excess of the zero-beta retur are linearly related 
to the columns of the matrix of factor sensitivities. The factors are assumed 
to be portfolio returns iti excess of the zevo-beta return. 

Define R, as an (N x D vector of real returns for N assets (or portfolios 
of assets), For the unconstrained model, we have a K-factor linear model: 


В, = a+ BRx, € (6.9.14) 


Кє] 


= 
— 
= 
te 
t 
— 


E[ee| = X (0.2.10) 
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E[Rk] = ng, ЕКЕ - Hr) Rx. xy) = fix 62.17) 
Cov[Rx,, €] = О. (6.2.18) 


В is the (Nx K) matrix of factor sensitivities, Rg; is the (Kx1) vector of 
factor portfolio real returns, and a and e, are (N x 1) vectors of asset return 
intercepts and disturbances, respectively. O is a (K x N) matrix of zeroes. 

For the unconstrained model in (6.2.14) the maximum likelihood esti- 
mators are 


a= = û, (6.2.19) 


T T 
B= 5 R — Zi Р — бк) Ree HA) 


tal t=! 


-1 

(6.2.20) 

R 1 2 EA UNE PO | 

5 = 7 L*. - à - BRé)(R, — à – BRN), (6.321) 
[E 


where 


In the constrained model real returns enter in excess of the expected 
zero-beta portfolio return yy. For the constrained model, we have 


R, 


1 


Lyo + B (Кк; — гу) + є; (6.2.29) 
(% Be) yo B Rx, + €. 


ll 


The constrained model estimators are: 


; 
B* Р — tfo) (Rx. — | 
mn 


T -l А 
х Р — Lyo) (Кк; — | (6.2.23) 


&[ 


m 1c m" N 
È „7 IR- 4 BRE, 40)! 
t=} 
x [Ry — tf — B*(Rx, — t) (6.2.24) 
wi D* nS De t 
Yo = [G-B'uX („ B' 


x [e -ByE (à - B' ) l. * (62.95) 
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Theinaximum likelihood estimates can be obtained by iterating over (6.2.23) 
to (6.2.25). B from (6.2.20) and X from (6.2.21) can be used as starung 
values for B and X in (6.2.25). 

Exact maximum likelihood estimators can also be calculated without 
iteration for this case. The methodology is a generalization of the approach 
outlined for the Black version of the CAPM in Chapter 5; it is presented 
by Shanken (1985a). The estimator of yo is thc solution of a quadratic 
equation, Given уу, the constrained maximum likelihood estimators of B 
and X follow from (6.2.23) and (6.2.24). 

The restrictions of the constrained model in (6.2.22) on the uncon- 
strained model in (6.2.14) arc 


а = (t—Be)y. (6.2.26) 


These restrictions can be tested using the likelihood ratio statistic / in 
(6.2.1). Under the null hypothesis the degrees of freedom of the null dis- 
tribution will be №1, There is a reduction of one degree of freedom in 
comparison to the case with a riskfree asset. A degree of freedom is used 
up in estimating the zero-beta expected return. 

For use in Section 6.3, we note that the asymptotic variance of Йу evalu- 
ated at the maximum likelihood estimators is 


А END" 
Var] = + (1 + (Ёк — 0) Ox (Ak — 20) 


х [(e- Ву В: (6.2.27) 


6.2.3 Macroeconomic Variables as Factors 


Factors need not be traded portfolios of assets; in some cases proposed fac- 
tors include macroeconomic variables such as innovations in GNP, changes 
in ыра yields, or unanticipated inflation. We now consider estimating aud 
testing exact factor pricing models with such factors. 

Again define R, as an (VX I) vector of real returns for N assets (or 
portfolios of assets). For the unconstrained model we have a K-factor linear 
model: 

i R = ac Bfyg, +€, (6.2.28) 


| 
| 


є] = 0 (6.2.29) 
E[cee/] = XE (0.2.30) 

m , ' А 
Еке) = Шук, E[ (fx: — Шук) (fy, — Шк) ] = Qx (6.2.31) 


Cov[fg,,€;] = O. (6.2.32) 
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B is the (Nx К) matrix of factor sensitivities, fx, is the (A x I) vector of factor 
realizations, and a and e, are (N x1) vectors of asset return intercepts and 
disturbances, respectively. O is a (Kx №) matrix of zeroes. 

For the unconstrained model in (6.2.14) the maximum likelihood esti- 
mators are 


à = H Вік (6.2.33) 


1 
B [De = (fk 2% 1 


И 


-1 


1 
| fay = 1 (fy, = йук) | (6.2.34) 


T 
Ê = FER —à — BHO (R. - à ~ Веку), (6.2.35) 


tt 


where 
і * | x 
% = 7 LR and Ёук 5 Te 
tzl [s 


The constrained model is most convenicutly formulated by comparing 
the unconditional expectation of (6.2.28) with (6.1.8). The unconditional 
expectation of (6.2.28) is 


и = at Вк. (6.2.36) 


where Myx = Elfy). Equating the right hand sides of (6.1.8) and (6.2.36) 
we have 

a = thy + B(A — Шк). (6.2.37) 
Defining yo as the zero-beta parameter Ag and defining y; as (Ак — и, к) 


where Ay is the (Kx1) vector of factor risk premia, for the 0 
model, we have 


К, = ty - By, BIN, +e. (6.2.38) 


The constrained model estimators are 


T 
| YOR, — ta) (fxi + g 


ix] 


T 1 
* 5» +41) (Ек, + | (6.2.39) 
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11 
„(R. еў) — ВЕК + 3010 (6.2.40) 
3 = XET xp xE T quus ul. (6.2.41) 


where in (6.2.41) X = [c B^] and y= lyr)’ 

The maximum likelihood estimates can be obtained by iterating over 
(6.2.39) to (6.2.41). B from (6.2.34) and È from (6.2.35) can be used as 
starting values for B and X in (6.2.41). 

The restrictions of (6.2.38) on (0.2.28) are 


a = tp +В). (6.2.49) 


These restrictions can be tested using the likelihood ratio statistic / in 
(6.2.1). Under the null hypothesis the degrees of freedom of the null dis- 
tribution is V K — I. There are N restrictions but one degree of freedom 
is lost estimating yo, and A degrees of freedom are used estimating the K 
elements of Ax. 

The asymptotic variance of 7 follows from the maximum likelihood 
approach. ‘Phe variance evaluated at the maximum likelihood estimators is 


— A А > 35d 2. XE! ы? " ú 
Vajgl = ;( 1 f fy YOK ЄТ thy) [X Xp (6943) 


Applying the partitioned inverse rule to (6.2.43), for the variances of the 
components of ¥ we have estimators 


Varl fa] 
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xcX BBE B'). (6.2.45) 
We will use these variance results for inferences concerning the factor risk 


premia in Section 6.3. 


6.2.4 Factor Portfolios Spanning the Mean-Variance Frontier 
j РЧ 


When factor portfolios span the mean-variance frontier, the intercept term 
of the exact pricing relation Ao is zero withont the need for a riskfree asset, 


. DUNS 
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Thus this case retains the simplicity of the first case with the riskfree asset. In 
the context of the APT, spanning occurs when two well-diversified portfolios 
are on the miniinum-variance boundary. Chamberlain (1983a) provides 
discussion of this case. 

The unconstrained model will be a K-factor model expressed in real 
returns. Define R, as an (Nx!) vector of real returns for N assets (or 
portfolios of assets). Then for real returns we have a K-factor linear model; 


R, = a+ BRx, +6, и (6:248) 

Fle] = 0 (6247) 

Elec] = E (6.2.48) 

ЕА] = ик. BR) Rs ug] = Пк (62.49) 
Cov[Ry, €] = О. (6.2.50) 


Bis the (N x К) matrix of factor sensitivities, Ry, is the (Kx 1) vector of factor 
portfolio real returns, and a and є, аге (N x 1) vectors of asset return inter 
cepts and disturbances, respectively. O is a (Kx N) matrix of zeroes. The 


restrictions on (6.2.46) imposed by the included factor portfolios spannind 
the mean-variance frontier are: 


а = 0 and Bc =. (6.2.51( 


To understand the intuition behind these restrictions, we can return to 
the Black version of the CAPM from Chapter 5 and can construct a span- 
ning example. The theory underlying the model differs but empirically the 
restrictions are the same as those on a two-factor APT model with spanning. 
The unconstrained Black model can be written as 


R. = at Bon R., + Bm Ri €; (6.2.52) 


where R and R, are the return on the market portfolio and the associated 
zero-heta portfolio, respectively. The restrictions on the Black model are 
a = бапа Ben = Lasshownin Chapter 5. These restrictions correspond 
to those in (6.2.51). 


For the unconstrained model in (6.2.46) the maximum likelihood esti- 
mators are 


a = û BHiô (6.2.53) 
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T 
B | R. – RK. – 1 
i=l 
T —1 
* m = ёк) (Кк, — 2) . (06.2.54) 
t=! 
È 


- 7 R,-à-BR&)R,-à-BRyj). (6.2.55) 
tal 


where 
| 1 J. ime 
й = 7 3 R, and Дк = T » 


To estimate the constrained model, we consider the unconstrained 
model in (6.2.46) with the matrix B partitioned into an (N x 1) column vector 
b, and an (N x (K —1)) matrix B, and the factor portfolio vector partitioned 
into the first row Ry, aud the last (K—1) rows R.. With this partitioning 
the constraint Be = can be written b; + Bye = г. For the unconstrained 
model we have 


К, = a+b, Ru + Bi Кк, + €i. (6.2.56) 


| Substituting a = 0 and b; = е — Bye into (6.2.56) gives the constrained 
model, 


| R, ~ IRI, = By (Кк tR) + € (6.2.57) 
| P 
{ Using (6.2.57) the maximum likelihood estimators arc 
| | " 
| Bi = (R. — LR?) (Кк RI)“ 
t=] 
| 1 -1 
| ‘Le Ry, — IRH (Ry RH | (6.2.58) 
bi = -Biu (6.2.59) 
at l ^ ^ | 
$ = FER BRH R. BRN (0.2.60) 
| i=l 
H 


Т he null hypothesis a equals zero can be tested using the likelihood ratio 
Statistic J in (6.2.1). Under the null hypothesis the degrees of freedom of 
the null distribution will be 2N since a = 0 is N restrictions and Be = tis 
N additional restrictions. 

We can also construct an exact test of the null hypothesis given the lincar- 
ity of the restrictions in (6.2.51) and the multivariate normality assumption. + 
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Defining f) as the test statistic we have 


T-N- K [È 
( EN T (6.2.61) 


aaa. 13 


Under the null hypothesis, f) is unconditionally distributed central F with 
2N degrees of freedom in the numerator and 207 N - АК) degrees of frec- 
dom in the denominator. Huberman and Kandel (1987) present a deriva- 
tion of this test. 


6.3 Estimation of Risk Premia and Expected Returns 


All the exact factor pricing models allow one to estimate the expected return 
on a given asset. Since the expected return relation is И = to + BA, onc 
necds measures of the factor sensitivity matrix B, the riskfree rate or the 
zero-beta expected return Ag, and the factor risk premia Ax. Obtaining 
measures of B and the riskfrce rate or the expected zero-beta return is 
straightforward. For the given case the constrained maximum likelihood 
estimator B' can be used for В. The observed riskfrec rate is appropriate 
for the riskfree asset or, iu the cases without a riskiree asset, the maximum 
likelihood estimator yo can be used for the expected zero-beta return. 

Further estimation is necessary to form estimates of the factor risk pre- 
mia. The appropriate procedure varies across the four cases of exact factor 
pricing. In the case where the factors are the excess returns on traded port- 
folios, the risk premia can be estimated directly from the sample means of 
the excess returns on the portfolios. For this case we have 


Ax = fik = TI. (6.3.1) 
1 iz] 
An estimator of the variance of Ax is 
TAMEN 1 ^ 1 а 5 7 f Lae 
Var[ Ax] = T Ок = 79 Ys = Bx) (Z Ик). (6.3.2) 


=| 
In the case where portfolios are factors but there is no risklree asset, 
the factor risk premia can be estimated using the difference between the 
sample mean of the factor portfolios and the estimated zero-beta return: 
Ак = fg — LP. (6.3.3) 


In this case, an estimator of the variance of Ag is 


— a 1. CY 
Var[Ay] = 7 N +-Маг{ жы. (6.3.4) 


—— 
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where var] u is trom (6.2.27). The fact that fiy and Jo are independent 
has been utilized to set the covariance term in (6.3.4) to zero. 

ln the case where the factors are not traded portfolios, an estimator of 
the vector of factor risk premia Ag is the sum of the estimator of the mean 
of the factor realizations and the estimator of yj, 


Aim Lti (6.3.5) 


Au estimator of the variance of Ag is 
Ta 12 -m 2 T 
Ми] Ак] = T Ox  Var[,], (6.3.6) 


where Varl. is from (6.2.45). Because ft, and ¥, are independent the 
covariance term in (6.3.6) is zero. 

The fourth case, where the factor portfolios span the mean-variance 
frontier, is the sime as the first case except that real returns are substituted 
lor excess returns. Leve Ак is the vector of factor portfolio sample means 
and Ao is zero. 

For any asset the expected return can be estimated by substituting the 
estimates of B, Ag, and Ag into (6.1.8). Since (6.1.8) is nonlinear in the pa- 
rameters, calculating a standard error requires using a linear approximation 
and estimates of the covariances of the parameter estimates. 

Wis also of interest to ask if the factors are jointly priced. Given the 
vector of risk premia estimates and its covariance matrix, tests of the null 
hypothesis that the factors are jointly not priced can be conducted using 
the following test statistic: 


h= оу VarfAg | Ag. (0.3.7) 
TK 
Asymptotically, under the null hypothesis that Ак = 0, Js has an F distribu- 
tion with & and K degrees of freedom, This distributional result is an 
application of the Hotelling 77 statistic and will be exact in finite samples 
for the cases where the estimator of Ag is based only on the sample means of 
the factors; We cau also test the significance of any individual factor using 


dose m m М, Ty; (6.3.8) 
Un 


where Ак is the jth element of Ày and v, is the Cj. fth element ol Var Ак |. 
"Testing (individual factors are priced is sensible for cases where the factors 
have been theoretically specified. With empirically derived factors, such 
tests are not uselul because, as we explain in Section 6.4.1, factors are iden- 
tificc only up to an orthogonal transformation; hence individual factors do 
not have clear-cut economic interpretations. 
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Shanken (1992b) shows that factor risk premia can also be estimated 
using a two-pass cross-sectional regression approach. In the first pass (he 
factor sensitivities are estimated asset-by-asset using OLS. These estimators 
represent a measure of the factor loading matrix B which we denote B. This 
estimator of B will be identical to the unconstrained maximum likelihood 
estimators previously presented for jointly normal and IID residuals. 

Using this estimator of B and the (N x 1) vector of asset returns for each 
lime period, the ex post factor risk premia can be estimated time- period-by- 
time-period in the second pass. The second-pass regression is 


Т, = cho + BAK +. (6.3.9) 


The regression can be consistently estimated using OLS; however, GLS can 
also be used. The output of the regression is a time series of ex post risk 
premia, Aer t=1,..., T, and an ex post measure of the zero-beta portfolio 
return, Aor t=],..., T. 

Common practice is then to conduct inferences about the risk premia 
using the means and standard deviations of these ex post series. While this 
approach is a reasonable approximation, Shanken (1992b) shows that the 
calculated standard errors of the means will understate the true standard 
errors because they do not account for the estimation error in B. Shanken 
derives an adjustment which gives consistent standard errors. No adjust- 
ment is needed when a maximum likelihood approach is used, because the 
maximum likelihood estimators already incorporate the adjustment. 


6.4 Selection of Factors 


The estimation and testing results in Section 6.2 assume that the identity 
of the factors is, known. In this section we address the issue of specifying 
the factors. The approaches fall into two basic categories, statistical and 
theorctical. The statistical approaches, largely motivated by the APT, involve 
building factors from a comprehensive set of asset returns (usually much 
larger than the set of returns used to estimate and test the model). Sample 
data on these returns are used to construct portfolios that represent factors. 
The theoretical approaches involve specifying factors based on arguments 
that the factors capture economy-wide systematic risks. 


6.4. I Statistical Approaches | 


Our starting point for the statistical construction of factors is the linear 
factor model, We present the analysis in terms of real returns. The samt 
analysis will apply to excess returns in cases with a riskfree asset. Recall that 


i 
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for the linear model we have 
R = a+ Bf, re, " (6.4.1) 


Elec, | fj] = X. (6.4.2) 


where R; is the (N x1) vector of asset returns for time period 4, f, is the 
(Kx E) vector of factor realizations for time period t, and e, is the (VX I) 
vector of model disturbances for time period t. The number of assets, №, ts 
now very large and usually much larger than the number of time periods, T. 
There are two primary statistical approaches, factor analysis and principal 
components. 


( "uctor Analysis 
Estimation using factor analysis involves a two-step procedure. First the 
actor sensitivity matrix B and the disturbance covariance matrix У are esti- 
jnated and then these estimates are used to construct measures of the factor 
realizations, For standard factor analysis it is assumed that there is à strict 
factor structure, With this structure K factors account for all the cross covari- 
ance of asset returns and hence È is diagonal. (Ross imposes this structure 
in his original development of the APT.) 
Given a strict factor structure and & factors, we can express the (Nx М) 
r'ovariauce matrix of asset returns as the sum of two components, the varia- 
fion from the factors plus the residual variation, 


Q = ВОК +D, (0.4.3) 


here Elf, fj] = €, and € = D to indicate it is diagonal. With the factors 
unknown, a rotational indeterminacy exists and B is identifed only up to 
I nonsingular transformation. This rotational indeterminacy can be climi- 
mated by restricting the factors to be orthogonal to cach other and to have 
qnit variance. In this case we have Qg = IL and B is unique ap to an or- 
thogonal transformation. All transforms BG arc equivalent for any (K x &) 
orthogonal transformation matrix G, i.e., GG’ = L With these restrictions 
in place we can express the return Covariance Matrix as 


О = BB' +D. (6.4.4) 


With the structure in (6.4.4) and the assumption that asset returns are jointly 
normal and temporally HD, estimators of B and D can be formulated using 
maximum likelihood factor analysis, Because the first-order couditious for 
maximum likelihood are highly nonlincar in the parameters, solving for the 
estimators with the usual iterative procedure can be slow and convergence 
difficult. Alternative algorithms have been developed by Jóreskog (1967) 
and Rubin and Thayer (1982) which facilitate quick convergence to the 
maximum likelihood estimators. 
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One interpretation of the maximum likelihood estimator of B given the 
maximum likelihood ания of D is that D^! times the estimator of B has 
the cigenvectors of D^! $2 associated with the К largest eigenvalues as its 
columns. For details of the estimation the interested reader can sec these 
papers, or Morrison (1990, chapter 9) and references therein, 

The second step in the estimation procedure is to estimate the (actors 
given B and L. Since the factors are derived from the covariance structure, 
the means are not specified in (6.4.1). Without loss of generality, we can 
restrict the factors to have zero means and express the factor model in terms 
of deviations about the means, 


(R, р) = Bf, + €, (6.4.5) 


Given (6.4.5), a candidate to proxy for the [actor realizations for time period 
Lis the cross-sectional generalized least squares (GLS) regression estimator. 
Using the maximum likelihood estimators of B and D we have for each / 


(B'D 7B) BD UR, — ju). (0.4.6) 


Here we are estimating f, by regressing (R; = д) onto В. The factor real- 
ization series, f, t=1,..., T, can be employed to test the model using the 
approach in Section 6.2.3. 

Since the factors ave linear combinations of returns we can construct 
portfolios which are perfectly correlated with the factors; Denoting Ry, as 
the (Nx D) vector of factor portfolio returns for time period /, we have 


Ru, = AWR, (6.4.7) 


where 
(BD-R) 'BD7!, 


and Ais defined asa diagonal matrix with 1/ Woas the pth diagonal element, 
where W is the jth element of We. 

The factor portfolio weights obtained for the jh factor from this pro- 
cedure ате equivalent to the weights that would result from solving the 


following optimization problem and then normalizing the weights to sum 
lo One: 


Min ej wr (0.4.8) 

subject to 
wb = 0 Vk ; j (6.4.0) 
wb, = 1 Vk = | (6.4.10) 
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That is, the factor portfolio weights minimize the residual variance subject 
to the constraints that each factor portfolio has a unit loading on its own 
factor and zero loadings on other factors. The resulting factor portfolio 
returns can be used in all the approaches discussed in Section 6.2. 

WB and D are known, then the factor estimators based on GLS with 
the population values of B and D will have the maximum correlation with 
the population factors. This follows from the minimum-variance unbiased 
estimator property of generalized least squares given the assumed normality 
of the disturbance vector. But in practice the factors in (6.4.6) and (6.4.7) 
need not have the maximum correlation with the population common fac- 
tors since they are based on estimates of B and D. Lehmann and Modest 
(1988) present an alternative to GLS. In the presence of measurement er- 
ror, they find this alternative can produce factor portfolios with a higher 
population correlation with the common factors. They suggest for the jth 


factor to usc OR, where the (N x 1) vector , is the solution to the following 
problem: 


be (6.4.11) 

subject to 
wb, = 0 Wk # j (6.4.12) 
он = L (0.4.13) 


This approach finds the portfolio which bas the minimum residual variance 
of all portfolios orthogonal to the other (K—1) factors. Unlike the GES 
procedure, this procedure ignores the information in the factor loadings of 
the jh factor, This possible that this is beneficial because of the measurement 
error in the loadings. Indeed, Lehmann and Modest find that this method 
of forming factor portfolios results in factors with less extreme weightings on 
the assets and a resulting higher correlation with the underlying common 
factors. 


Principal Components 

Factor analysis represents only one statistical method of forming factor port- 
folios. An alternative approach is principal components analysis, Principal 
components is a technique to reduce the number of variables being stud- 
icd without losing too much information in the covariance matrix. In the 
present application, the objective is to reduce the dimension from N asset 
returns to K factors; The principal components serve as the factors. The 
first principal component is the (normalized) linear combination of asset 
returns with imasimum variance, The second principal component is the 
(normalized) linear combination of asset returns with maximum variance 
of all combinations orthogonal tothe first principal component, And soon, 
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The first sample principal component is х Е, where the (№ х 1) vector 
xj is the solution to the following problem: | 


Max x, f xı (6.414) 


subject to : 

xxi = 1. (6.4.15) 
$ is the sample covariance matrix of returns. The solution xj is the eigen- 
vector associated with the largest eigenvalue of Ê. To facilitate the portfolio 
interpretation of the factors we can detine the first factor as wi R; where 
w is x} scaled by the reciprocal of “хт so that its elements sum to Ine. 
The second sample principal pee solves the above problem for хә 
in the place of xi with the additional restriction хүхэ = 0. The solution 
xj is the eigenvector associated with the second largest eigenvalue of о. x) 
can be scaled by the reciprocal of x? giving we, and then the second factor 
portfolio will be R.. In general the jth factor will be „R, where wj is the 


rescaled eigenvector associated with the jth largest eigenvalue of 8. The 
factor portfolios derived from the first K principal components analysis can 
then be employed as factors for all the tests outlined in Section 6.2, 

Auother principal components approach has been developed by Con- 
nor and Korajczyk (1986, 1988).4 They propose using the eigenvectors as- 
sociated with the K largest eigenvalues of the (Tx T) centered returns cross- 
product matrix rather than the standard approach which uses the principal 
components of the (N x N) sample covariance matrix. They show that as the 
cross section becomes large the (Kx T) matrix with the rows consisting of 
the K eigenvectors of the cross-product matrix will converge to the matrix 
of factor realizations (up to a nonsingular linear transformation reflecting 
the rotational indeterminancy of factor models). The potential advantages 
of this approach are that it allows for time-varying factor risk premia and 
that it is computationally convenient. Because it is typical to have a cross 
section of assets much larger than the number of time-series observations, 
analyzing a (7x T) matrix can be less burdensome than working with an 
(N x N) sample covariance matrix. 


Factor Analysis or Principal Components? 

We have discussed two statistical primary approaches for constructing the 
model factors—factor analysis and principal components. Within each ap- 
proach there are possible variations in the process of estimating the factors. 
A question arises as to which technique is optimal in the sense of providing 
the most precise measures of the population factors given a fixed sample of 
returns. Unfortunately the answer in finite samples is not clear although al 
procedures can he justified in large samples. 


"Sec also Mei (1993), 
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Chamberlain and Rothschild (1983) show that consistent estimates of 
the factor loading matrix B can be obtained from the cigenvectors associated 
with the largest eigenvalues of TQ, where Y is any arbitrary positive 

definite matrix with cigenvalues bounded away from zero and infinity. Both 
standard factor analysis and principal components fit into this category, for 
\factor analysis Y = D and for principal components Y = L However, 
the finite-sample applicability of the result is unclear since it is required that 
both the number of assets N and the number of time periods T go to infinity. 
j The Connor and Korajczyk principal components approach is also con- 
jsistent as N increases. It has the further potential advantage that it only 
requires 7 > K and does not require T to increase to infinity. However, 
whether in finite samples it dominates factor analysis or standard principal 
components is an open question. 


| 6.4.2 Number of Factors £ 


The underlying theory of the multifactor models does not specify the num- 
ber of factors that are required, that is, the value of K. While, for the theory 
{to be useful, K should be reasonably small, the researcher still has signifi- 
cant latitude in the choice. In empirical work this lack of specification has 
been handled in several ways. One approach is to repeat the estimation ' 
hind testing of the model for a variety of values of K and observe if the tests 
are scusitive to increasing the number of factors. For example Lehmann 
and Modest (1988) present empirical results for five, ten, and fifteen fac- 
tors. Their results display minimal sensitivity when the number of factors 
increases from fivc to ten to fifteen. Similarly Connor and Korajczyk (1988) 
consider fivc and ten factors with little sensitivity to the additional five fac- 
tors. These results suggest that five factors are adequate. 

A second approach is to test explicitly for the adequacy of K factors. 
An asymptotic likelihood ratio test of the adequacy of K factors can be con- 
structed using —2 times the difference of the value of the log-likelihood 
function of the covariance matrix evaluated at the constrained and un- 
constrained estimators. Morrison (1990, p. 362) presents this test. The 
likclihood ratio test statistic is 


h=- (r- 1 — HÊN +5) — 30 [log]£] — log]BB' + IJ, (6.4.16) 


where $ is the maximum likelihood estimator of Q and B and D are the 
maximum likelihood estimators of B and D, respectively. The leading term 
is an adjustment to improve the convergence of the finite-sample null dis- 
tribution to the large-sample distribution. Under the null hypothesis that 
K factors are adequate, Ja will be asymptotically distributed (7 — оо) asa 
chi-square variate with 2 (N — K) 7 – № — K) degrees of freedom. Roll and 
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Ross (1980) use this approach and conclude that three or four factors are 
adequate. 

A potential drawback of using the test from maximum likclihood factor 
analysis is that the constrained model assumes а strict factor structure— 
an assumption which is not theoretically necessary. Connor and Korajezyk 
(1993) develop an asymptotic test (№ — оо) for the adequacy of K factors 
under the assumption of an approximate factor structure, Their test uses the 
result that with an approximate factor structure the average cross-sectional 
variation explained by the КА factor approaches zero as N increases, 


Jun x b. Abet = 0. (6.4.17) 
where the dependence of bgy; on N is implicit. This implies that in a large 
cross section generated by a A-factor model, the average residual variance 
in a linear factor model estimated with A 4-1 factors should converge to the 
average residual variance with K factors, This is the implication Connor aud 
Korajezyk test. Examining returns from stocks listed on the New York Stock 
Exchange and the American Stock Exchange they conclude that there are 
up to six pervasive factors. 


6.4.3 Theoretical Approaches 


Theoretically based approaches for selecting factors fall into two main cat- 
cgories. One approach is to specify macroeconomic and financial market 
variables that are thought to capture the systematic risks of the economy. A 
second approach is to specify characteristics of firms which are likely to ex- 
plain differential sensitivity to the systematic risks and then form portfolios 
of stocks based on the characteristics. 

Chen, Roll, and Ross (1986) is a good example of the first approach. 
The authors argue that in selecting factors we should consider forces which 
will explain changes in the discount rate used to discount future expected 
cash flows and forces which influence expected cash flows themselves. Based 
on intuitive analysis and empirical investigation a five-factor model is pro- 
posed. The factors include the yield spread between long and short interest 
rates for US government bonds (maturity premium), expected inflation, 
unexpected inflation, industrial production growth, and the yield spread 
between corporate high- and low-grade bonds (default premium). Aggre- 
gate consumption growth and oil prices are found not to have incremental 
effects beyond the five factors.“ 


PAu alternative implementation of the first approach is given by Campbell (10964) and is 
discussed in Chapter 8. 
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The second approach of creating factor portfolios based on firm char- 
acteristics has been used in a number of studies, These characteristics have 
mostly surfaced from the literature of CAPM Violations discussed in Chap- 
ter 5. Characteristics which have been found to be empirically important 
include market value of equity, priceto-earnings ratio, and ratio of book 
value of equity to market value of equity. The general finding is that factor 
models which include a broad based market portfolio (such as an equal- 
weighted index) and factor portfolios created using these characteristics do 
a good job in explaining the cross section of returns. However, because 
the important characteristics have been identified largely through empiri- 
cal analysis, their importance may be overstated because of data-snooping 
biases, We will discuss this issue in Section 6.6, 


6.5 Empirical Results 


Many empirical studies of multifactor models exist. We will review four of 
the studies which nicely illustrate the estimation and testing methodology 
we have discussed. Two comprehensive studies using statistical approaches 
to select the ficteis are Lehmann and Modest (1988) and Connor and Ko- 
rajczyk (1988), Lehmann and Modest П.М] use factor analysis and Connor 
and Korajezyk [CK] use (Ux T) principal components, Two studies using the 
theoretical approach to factor identification are Fama and French (1993) 
and Chen, Roll, and Ross (1986). Fama and Freneh [FF] use firm charac- 
teristics to form factor portfolios and Chen, Roll, and Ross [CRR] specify 
Macroeconomic variables as Factors, The first three Studies include tesis of 
the implications of exact factor pricing, while Chen, Roll, and Ross focus 
on whether or not the factors are priced, The evidence supporting exact 
factor pricing is mixed. Table 6.1 summarizes the main results froin LM, 
CK, and FE, 

A number of general points emerge from this table. The strongest 
evidence against exact factor pricing comes from tests using dependent 
portfolios based on market valne of equity and book-to-market ratios, Even 
multifactor models have diffi ulty explaining the "size" effect and "book to 
marker" effect. Portfolios Which are formed based on dividend yield and 
based on own variance provide little evidence against exact factor pricing, 
The CK results for January and non-fanuary months suggest that the evi- 
dence against exact factor pricing docs not arise from the January effect, 

Using the statistical approaches, CK and LM find little sensitivity to 
increasing the number of factors beyond буе. On the other hand FF find 
some improvement going from two factors to five factors. In results not 

“included, FF find that with stocks only three factors are necessary and that 


when bond portfolios are included then five factors are needed. These 


Table 6.1. Summary of results for tests of exact factor pricing using zero-intercept F-test. 


Study Time period Portfolio characteristic N K pvalue 
CK 64:01-83:12 market value of equity 10 5 0.002 
CK 10 10 0.002 
cK! 10 5 0.236 
(K. 10 10 0.171 
СҚА! 10 5 0.011 
(KN 10 10 0.019 
LM 63:01-82:12 market value of equity 5 5 ev 
ім 5 10 ae 
LM : 5 15 «e 
LM 20 5 0.11 
LM 20 10 0.14 
LM 20 15 0.42 
LM 63:01-82:12 dividend yield 5 5 0.17 
LM 5 10 0.18 
LM 5 15 0.17 
LM 20 5 0.94 
I.M 20 10 0.97 
LM 20 15 0.98 
LM 63:01-82:12 own variance 5 5 0.29 
LM Ё 5 10 0.57 
LM 5 15 0.55 
LM 20 5 0.83 
I.M 20 10 0.97 
LM 20 15 0.98 
FF 63:07-91:12 stocks and bonds 32 2 0.010 
FF 32 3 0.039 i 
FF 32 5 0.025 | 


**Less than 0.001, 


CK refers to Connor and Korajezyk (1988), LM refers to Lehmann and Modest (1988), and 
FF refers to Fama and French (1993). The CK factors are derived using (Tx T) principal 
components, the LM factors are derived using maximum likelihood factor analysis, and the FF 
factors are prespecified factor portfolios. For the FF two-factor case the factors are the return on 
a portfolio of low market value of equity firins minus a portfolio of high market value of equity 
firms and the return on a portfolio of high book-to-market value firms minus a portfolio of low 
book-to-market value firms. For the three-factor case the factors are those in the two-factor case 
plus the return on the CRSP value-weighted stock index. For the five-factor case the returns 
on a term structure factor and a default risk factor are added. CK include tests separatihg 
the intercept for January from the intercept for other months. СКЈ are results of tests of the 
hypothesis that the January intercept is zero and СКА are results of tests of the hypothesis that 
the non-January intercept is zero. CK and FF work with a monthly sampling interval. LM use 
a daily interval to estimate the factors and a weekly interval for testing. The test results from 
CK and LM are based on tests from four five-year periods aggregated together, The portfolio 
characteristic represents the firm characteristic used to allocate stocks into the dependent 
portfolios, FF use 25 stock portfolios and 7 bond portfolios. The stock portfolios are created 
using a two way sort based on market value of equity and book-value-to-market-value ratios. 
The band portfolios include five US government bond portfolios and two corporate bond 
portfolios. The government bond portfolios are created based on maturity and the corporate 
bond portfolios are created based on the level of default risk. N is the number of dependent 
portfolios and K isthe number of factors, The pvatues are veported for the zero-intercept Z-test. 


— 
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Fesults are generally consistent with direct tests for the number of factors 
liscussed in Section 6.4.2. 
The LM results display considerable sensitivity to the number of depen- 
ent portfolios included. The pvalues are considerably lower with fewer 
ae This is most likely an issue of the power of the test. For these 
ests with an unspecified alternative hypothesis, reducing the number of 
portfolios without eliminating the deviations from the null hypothesis can 
cad to substantial increases in power, because fewer restrictions must be 
csted. 

The CRR paper focuses on the pricing of the factors. They USC a cross- 
cctional regression methodology which is similar to the approach presented 
in Section 6.3. As previously noted they find evidence of five priced factors. 

The factors include the yield spread between long and short interest rates 
for US government bonds (maturity premium), expected inflation, unex- 
pected inflation, industrial production growth, and the yield spread between 
corporate high- and low-grade bonds (default premium). 


6.6 Interpreting Deviations from Exact Factor Pricing 


We have just reviewed empirical evidence which suggests that, while multi- 
factor models do a reasonable job of describing the cross section of returns, 
deviations from the models do exist. Given this, itis important to consider 
the possible sources of deviations from exact factor pricing. This issue is 
important because in a given finite sample it is always possible to find an ad- 
ditional factor that will make the deviations vanish. However the procedure 
of adding an extra factor implicitly assumes that the source of the deviations 
is a missing risk factor and does not consider other possible explanations. 

In this section we analyze the deviations from exact factor pricing for 
a given model with the objective of exploring the source of the deviations. 
For the analysis the potential sources of deviations are categorized into 
two groups—risk-based and nonrisk-based. The objective is to evaluate the 
plausibility of the argument that the deviations from the given factor model 
can be explained by additional visk factors. 

The analysis relies on an important distinction between the two cate- 
gorics, namely, a difference in the behavior of the maximum squared Sharpe 
ratio as the cross section of securities is increased. (Recall that the Sharpe 
ratio is the ratio of the mean excess return to the standard deviation of 
the excess return.) For the risk based alternatives the maximum squared 
Sharpe ratio is bounded and for the nonrisk-based alternatives the maxi- 
mum squared Sharpe ratio is a less useful construct and can, in principle, 
We unbounded. 


6.6. Inleipirling Deviations from Exact Factor Pricing US à 243 
Б E 


6.6.1 Exact Factor Pricing Models, Mean-Variance A nalysis, 
and the Optimal Orthogonal Portfolio 


For the initial analysis we drop back to the level of the primary assets in the 
economy. Let N be the number of primary assets. Assume that a riskfree 
asset exists. Let Z, represent the (Nx 1) vector of excess returns for period 
t. Assume Z, is stationary and ergodic with mean p and covariance matrix 
Q that is full rank. We also take as given a set of K factor portfolios and 
analyze the deviations from exact factor pricing. For the factor model, as in 
(6.2.2), we have 

Z, = a+ BZ +e). (6.6.1) 
Here B is the (N x X) matrix of factor loadings, Zi, is the (Xx) vector of 
time-¢ factor portfolio excess returns, and a and €; are (Nx 1) vectors of asset 
return intercepts and disturbances, respectively. "he variance-covariance 
matrix of the disturbances is X and the variance-covariance matrix of the 
factors is Qg, as in (6.2.3)-(6.2.6). The values of a, B, and X will depend 
on the factor portfolios, but this dependence is suppressed for notational 
convenience. 

lf we have exact factor pricing relative to the А factors, all the elements 
of the vector a will be zero; equivalently, a linear combination of the factor 
portfolios forms the tangency portfolio (the mean-variance efficient portfo- 
lio of risky assets given the presence of a riskfree asset). Let Z, bc the excess 
return of the (ex ante) langency portfolio and letw, be the (Nx 1) vector of 
portfolio weights. From mean-variance analysis (sec Chapter 5), 

wy = QU In) R . (6.6.2) 
In the context of the K-factor model in (6.0.1), we have exact factor pric- 
ing when the tangeney portfolio in (6.6.2) сап be formed from a linear 
combination of the K factor portfolios. 

Now consider the case where we do not have exact factor pricing, so the 
langency portfolio cannot be formed from a lincar combination of the factor 
portfolios. Our interest is in developing the relation between the deviations 
from the asset pricing model, a, and the residual covariance matrix, L. To 
facilitate this, we define the optimal orthogonal portfolio! which is the unique 
portfolio that can be combined with the А factor portfolios to form the 
tangency portfolio and is orthogonal to the factor portfolios. 


Definition (optimal orthogonal portfolio). Take as given K factor portfolios which 
can not be combined to form the tangency portfolio or the global minimum-variance 


portfolio. A portfolio li will be defined as the optimal orthogonal portfolio with respect 
to these K factor portfolios if 


wy = WO bu iw) (6.6.3) 


See Roll (1080) for general properties of orthogonal portfolios, 
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and 


w, QW, = 0 (6.6.4) 


for a (Nx 1) vector w where Wy is the (NX К) matrix of asset weights for the factor 
portfolios, wp is the (N x V) vector of asset weights for the optimal orthogonal portfolio, 
and Qo, is the NX V) vector of asset weights for the ltangency portfolio. If one considers 


a model without any factor portfolios (K = 0) then the optimal orthogonal portfolio 
will be the tangency portfolio. 


The weights of portfolio A can be expressed in terms of the parameters 
of the K-factor model, The vector of weights is 


wr = ) lay!Qa 


Ш 


(Stay Sta, (6.6.5) 


where the f superscript indicates the generalized inverse, The usefulness of 
this portfolio comes from the fact that when added to (6.6.1) the intercept 
will vanish and the factor loading matrix B will not be altered. The optimality 
restriction in (6.0.3) leads to the intercept vanishing, and the orthogonality 
condition in (6.6.4) leads to B being unchanged, Adding in Zi: 


Z = В7к, Vw (6.6.6) 
чи] = 0 (6.6.7) 

Кил] = Ф (6.6.8) 

Ерм = n. El (Au = нА] = a; (6.6.9) 
Соў кош] = 0 . (6.6.10) 
(eu,! = 0. (0.0.11) 


We can relate the optimal orthogonal portfolio parameters to the (actor 
model deviations by comparing (6.6.1) and (6.6.6). Taking the uncondi- 
tional expectations of both sides, 


а = fuu, (6.6.12) 


and by equating the viuiance of e, with the variance of By Zu + uy, 


2 
ae 
^ roy aa! Oh T 
У = „G, +P = aa 5 + Ф. (0.6.13) 
Ry 
The key link between the model deviations and the residual variances and 
covariances emerges trom (6.0.63). The intuition for the link is straight- 
forward. Deviations from the model must be accompanied by a common 


} 
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DASS 


DES 
component in the residual variance to prevent the formation of a portfolio 
with a positive deviation and a residual variance that decreases to zero as the 


number of securities in the portfolio grows, that is, an asymptotic arbitrage 
opportunity. 


6.6.2 Squared Sharpe Ratios 


The squared Sharpe ratio is a useful construct for interpreting much of 
the ensuing analysis. The tangency portfolio 4 has the maximum squared 
Sharpe measure of all portfolios. The squared Sharpe ratio of q, s, is 


59 = шаи. (6.6.14) 


Given that the K factor portfolios and the optimal orthogonal portfolio Л 
can be combined to form the tangency portfolio, the maximum squared 
Sharpe ratio of these K-+1 portfolios will be i. Since A is orthogonal to the 
portfolios K, MacKinlay (1995) shows that one can express s? as the sum 


of the squared Sharpe ratio of the orthogonal portfolio and the squared 
maximum Sharpe ratio of the factor portfolios, 


5р = g + . (6.6.15) 
where sf = 12/07 and зу = py! ny? 

Empirical tests of multifactor models employ subsets of the N assets. 
The factor portfolios need not be linear combinations of the subset of assets, 
Results similar to those above will hold within a subset of N assets. For 
subset analysis when considering the tangency portfolio (of the subset), the 
maximum squared Sharpe ratio of the assets and factor portfolios, and the 
optimal orthogonal portfolio for the subset, it is necessary to augment the 
N assets with the factor portfolios K. Defining Z; as the (N--K x1) vector 


Z, Z'.,]' with mean pe” and covariance matrix Q,, for the tangency portfolio 
x. H, s Bency poruo 
of these NK assets we have 


8 = Ar ut. (6.6.6) 
| 
The subscript s indicates that a subset of the assets is being considered. If 
any of the factor portfolios is a linear combination of the N assets, it will'be 
necessary to use the generalized inverse in (6.6.16). 


“This result is related to the work of Gibbons, Ross, and Shanken (1989). 


| 
| 
b 
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The analysis (with a subset of assets) involves the quadratic a a com- 
uted using the parameters for the N assets. Gibbons, Ross, and Shanken 
0 1989) and Lehmann (1987, 1992) provide interpretations of this quadratic 
erm using Sharpe ratios. Assuming X is of full rank, they show 
| aj B, a, sf K. (6.6.17) 
Consistent with (6.6.15), for the subset of assets a Dla is che squared Sharpe 
ratio of the subset's optimal orthogonal portfolio 4. Therefore, for a given 
subset of assets: 


rs = aXXa, (6.6.18) 
and 

„ = . T (6.6.19) 
Ug h, K* .6. 1“ 


Note that che squared Sharpe ratio of the subsets optimal orthogonal port- 
folio is less than or equal to that of the population optimal orthogonal 
portfolio, that is, 


Sh X She (6.6.20) 


Next we use the optimal orthogonal portfolio and the Sharpe ratios 
results together with the model deviation residual variance link to develop 
implications for distinguishing among assct pricing models. Hereafter the 
s subscript is suppressed. No ambiguity will result since, in the subsequent 
analysis, we will be working only with subsets of the assets. 


6.6.3 Implications for Separating Alternative Theories 


If a given factor model is rejected а common interpretation is that more (or 
different) risk factors arc required to explain the risk-return relation, This 
interpretation suggests that one should include additional factors so that the 
null hypothesis will be accepted. A shortcoming of this popular approach 
is that there are multiple potential interpretations of why the hypothesis 
is accepted. Onc view is that genuine progress in terms of identifying the 
“right” asset pricing model has been made. But it could also be the case that 
the apparent success in identifying a better model has come from finding 
al good within-sample fit through data-snooping. The likelihood of this 
possibility is increased by the fact that the additional factors lack theoretical 
motivation. 

This section attempts to discriminate between the two interpretations. 
To do this, we compare the distribution of the test statistic under the null 
hypothesis with the distribution under each of the alternatives. 

We reconsider the zero-intercept КАсм of the null hypothesis that the 
intercept vector a from (6.6.1) is O. Let Ho be the null hypothesis and Ha 
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be the alternative: 
Hy a = 0 
lla: a £4 0 
Flo can be tested using the test statistic Ji from (6.2.12): 


А = Vb xf йк} AB f. (6.6.21) 
N 
where Tis the number of ime-scries observations, N is the number of assets 
or portfolios of assets included, and K is the number of factor portfolios. The 
hat superscripts indicate the maximum likelihood estimators. Under the 
null hypothesis, Jj is unconditionally distributed central F with N degrees 
of freedom in the numerator and (7 — N — K) degrees of freedom in the 
denominator. 
To interpret deviations from the null hypothesis, we require a general 
representation for the distribution of Jj. Conditional on the factor portfolio 
returns the distribution of / is 


Ji > Deena, (6.6.92) 


8 = T+ A, fas] aa, (6.6.23) 
where à is the noncentrality parameter of the F distribution. НИК = 0 then 
the term [1 + ЙА will not appear in (6.6.21) or in (6.6.23), and 
Ji will be unconditionally distributed non-central F. 

We consider the distribution of under two different alternatives, 
which are separated by their implications for the maximum valuc of the 
squared Sharpe ratio. With the risk-based multifactor alternative there will 
be an upper bound on the squared Sharpe ratio, whereas with the nonrisk- 
based alternatives the maximum squared Sharpe ratio is unbounded as the 
number of assets increases. 

First consider the distribution of J under the alternative hypothesis 
that deviations are due to missing factors. Drawing on the results for the 
squared Sharpe ratios, the noncentrality parameter of the “ distribution is 


б = TU gl ae. (6.6.24) 


From (6.6.20), the third term in (6.6.24) is bounded above by s and positive. 
The second term is bounded between zero and one, Thus there is an upper 
bound for à, 

8 < Ту < Ту. (6.6.25) 
The second inequality follows from the fact that the tangeney portfolio q 
has the maximum Sharpe ratio of any asset or portfolio. 
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Given a maximum value for the squared Sharpe ratio, the upper bound 
on the noncentrality parameter cai be important. With this bound, in- 
dependent of how one arranges the assets to be included as dependent 
variables in tlic pricing model regression and for any value of NP there is a 
limit on the distance between the null distribution and the distribution of 
the test statistic under the missing factor alternative. AH the assets can be 
mnispriced and vet the bound will still apply. 

In contrast, when the alternative one has in mind is that the source 
of deviations is nonrisk-based, such as data snooping, market frictions, or 
market irrationalities, the notion of à maximum squared Sharpe ratio is 
not useful. The squared Sharpe ratio (and the noncentrality parameter) 
are in principle unbounded because the theory linking the deviations and 
the residual variances and covariances does not apply. When comparing 
alternatives with the intercepts of about the same magnitude, in general, 
one would expect to see larger test statistics in this nonrisk-based case. 

We examine the informativeness of the above analysis by considering 
alternatives with realistic parameter values. We consider the distribution of 
the test statistic for three cases: the null hypothesis, the missing risk factors 
alternative, and the nourisk-based alternative. For the risk-based alternative, 
the framework is designed to be similar to that in Fama and French (1993), 
For the nonrisk-based alternative we use a setup that is consistent with the 
analysis of Lo and MacRinlay (1990b) and the work of Lakonishok, Shleifer, 
and Vishny (1904). 

Consider a one-fetor asset pricing model using a time series of the 
excess returns for 32 portfolios as the dependent variable. The one factor 
(independent variable) is the excess return of the market so that the zero- 
intercept null hypothesis is the CAPM. The length of the time series is 342 
months, This setup corresponds to that of Fama and French (1993, Table 
Ч, regression (ii)). The null distribution of the test statistic Ji is 


Jo T.). (6.6.26) 


To define the disibunon of /; under the alternatives of interest one 
needs to specily the parameters necessary to calculate the noncentrality pa- 
rameter, For the risk based alternative, given a value for the squared Sharpe 
ratio of the optimal orthogonal portfolio, the distribution corresponding 
to the upper bound of the noncentrality parameter from (6.6.25) can be 
considered. The Sharpe ratio of the optimal orthogonal portfolio can be 
obtained: using (0.6.19) given the squared Sharpe ratios of the taugeucy 
portfolio and of the included factor portfolio. 


2 P . е 
“In practice when using the Fest U will he u, far & ta be less than Z- А so that 2j 
will he of tull an. 
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MacKinlay (1995) argues that in a perfect capital markets setting, a: 
reasonable value for the Sharpe ratio squared of the tangency portfolio 
for an observation interval of one month is 0.031 (or approximately 0.61 
for the Sharpe ratio on an annualized basis). This value, for example, | 
corresponds to a portfolio with an annual expected excess return of 10%: 
and a standard deviation of 1696. If the maximum squared Sharpe ratio of 
the included factor portfolios is the ex post squared Sharpe ratio of the CRSP 
value-weighted index, the implied maximum squared Sharpe ratio for the: 
optimal orthogonal portfolio is 0.021. This monthly value of 0.021 would! 
be consistent with a portfolio which has an annualized mean excess return' 
of 8% and annualized standard deviation of 16%. We work through the 
analysis using this value. | 

Using this squared Sharpe ratio for the optimal orthogonal portfolio to | 
calculate 8, the distribution of Д from equation (6.2.1) is b 


Л ~ Hz. 809 (7. I). (6.6.27) 


This distribution will be used to characterize the risk-based alternative. 
One can specify the distribution for two nonrisk-based alternatives by 


specifying values ofa, Ў, and xf Ак, and then calculating à from (6.6.23). 
To specify the intercepts we assume that the elements of a are normally dis- 
tributed with a mean of zéro. We consider two values for the standard devia- 
tion, 0.0007 and 0.001. When the standard deviation of the elements of a is 
0.001 about 9596 of deviations willlie between —0.002 and 40.002, an annual- 
ized spread of about 4.876. A standard deviation of 0.0007 for the deviations 
would correspond to an annual spread of about 3.496. These spreads are 
consistent with spreads that could arise from data-snooping.? They are plau- 
sible and even somewhat conservative given the contrarian strategy returns 
presented in papers such as Lakonishok, Shleifer, and Vishny (1993). For L 
we use a sample estimate based on portfolios sorted by market capitalization 
for the Fama and French (1993) sample period 1963 to 1991. The effect of 


DER Ёк on 6 will typically be small, so it is set to zero. To get an idea ofa 
reasonable value for the noncentrality parameter given this alternative, the 
expected value of 8 given the distributional assumption for the elements 
of a conditional upon X = X is considered. The expected value of the 
noncentrality parameter is 39.4 for a standard deviation of 0.0007 and 80.3 
for a standard deviation of 0.001. Using these values for the noncentrality 
parameter, the distribution of J; is 


Л ~ H. (89.4) (6.6.98) 


"With dateanooping the distribution of Л is nt exactly u noncenual F (see Lo and Mac Kin- 
lay (1990b]). However, for the purposes of this analysis, the noncentral F will be a good 
approximation, 
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Null 
(6.6.26) 


Alternative 1 (6.6.27) 


Probability 


eT. Alternative 2 (6.6.28) 


Й ` Alternative 3 (6.6.29) 
Р mA 


F statistic 


Figure 6.1. Distributions for the CAPM Zero-Intercept Test Statistic for Four Hypotheses 


when c, = 0.0007 and 
Л ~ Lat. a (80.8) (6.6.29) 


when c, — 0.001. 

A plot of the four distributions from (6.6.26), (6.6.27), (6.6.28), and 
(9.6.29) is in Figure 6.1. The vertical bar on the plot represents the value 
1191 which Fama and French calculate for the test statistic. From this figure, 
notice that the distributions under the null hypothesis and the risk-based 
Шеш hypothesis are quite close together. This reflects the impact of 
the upper bound on the noncentrality parameter. In contrast, the nonrisk- 
bhsed alternatives’ distributions are far to the right of the other two distri- 
butions, consistent with the unboundedness of the noncentrality parameter 
tor these alternatives. 

Given that Fama and French find a test statistic of 1.91, these results 
suggest that the missing-risk-factors argument is not the whole story. From 
Fre 6.1 one can see that 1.91 is still in the upper tail when the distribution 
of ji in the presence of missing risk factors is tabulated. The p-value using 
this distribution is 0.03 for the monthly data. Hence it seems unlikely that 
njissing factors completely explain the deviations. 

| The data offer some support for the nonrisk-based alternative views. 
The test statistic falls almost in the middle of the nonrisk-based alterna- 


| 
i 
| "See MacKinlay (1987) for detailed analysis of the risk-based alternative. 
1 
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tive with the lower standard deviation of the Clements of a. Several of 
the nonrisk-based alternatives could equally well explain the results, Dif- 
ferent nonrisk-based views can give the same noncentrality parameter and 
tesostatistic distribution. The results are consistent with the data-snooping 
alternative of Lo and MacKinlay (1990b), with the related sample selection 
biases discussed by Breen and Korajezyk (1993) and Kothari, Shanken, and 
Sloan (1995), and with the presence of market inefficiencies 


6.7 Conclusion 


In this chapter we have developed the econometrics for estimating and test- 
ing multifactor pricing models. These models provide an attractive alterna- 
live to the single-factor CAPM, but users of such models should be aware of 
two serious dangers that arise when factors are chosen to fit existing data 
without regard to economic theory, First, the models may overfit the data 
because of data-snooping bias; in this case they will not be able to predict 
asset returns in the future. Second, the models may capture empirical reg- 
ularities that are due to market inefficiency or investor irrationality; in this 
case they may continue to fit the data but they will imply Sharpe ratios for 
factor portfolios that are too high to be consistent with a reasonable under- 
lying model of market equilibrium. Both these problems can be mitigated 
if one derives a factor structure from an equilibrium model, along the lines 
discussed in Chapter 8. In the end, however, the usefulness of multifactor 
models will not be fully known until sufficient new data become available to 
provide a true outof-sample check on their performance. 


Problems—Chapter 6 


6.1 Consider a multiple regression of the return on any asset or portfolio 
Ra on the returns of any set of portfolios from which the entire minimum- 
variance boundary can be generated, Show that the intercept of this repres- 
sion will be zero and that the factor regression coefficients for any asset will 
Sum to unity, 


6.2 Consider two economies, economy A and economy B. The mean 
css turn vector and the covariance matrix is specified below for each 
of the economies. Assume there exist a riskfree asset, N risky assets with 
Mean excess return u and nonsingular covariance matrix Q, and a risky 
factor portfolio with mean CXCCSS return ир and variance ор. The factor 
portfolio is пога linear combination of the N assets. (This criterion can be 
met by climinating one of the assets which is included in the factor portfolio 


22 
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if necessary.) For both economies A and В: 


ғ 


11 = а + Bu, (6.7.1) 
0 = fla; + ёбой + la}. (6.7.2) 


Given the above mean and covariance matrix and the assumption that die 
factor portfolio p is a traded asset, what is the maximum squared Sharpe 
ratio for the given economies? 


6.3 Returning io the above problem, the economies are further specified, 


Assume the elements of a are erossssectionally independent and identically 
distributed, 


4, ~ ПОО, % i= 1. . .., N. (0.7.3) 

The specification of the distribution of the elements of ô conditional on a 
differentiates economies A and В. For economy А: 

5, [а ~ Шаб) i= 1. .. V. (0.7.4) 


and for economy Ж 
б, |а HDO ѓе 1. .. N. (0.7.3) 


Unconditionally the crosssectional distribution of the elements of & will 
be the same for both economies, but for economy A conditional on a, 6 is 
fixed. What is the maximum squared Sharpe ratio for each economy? What 
is the maximum squared Sharpe ratio for each economy as the N increases 
to infinity? 


Present-Value Relations 


THE FIRST PART of this book has examined the behavior of stock returns 
in some detail. The exclusive focus on returns is traditional in empirical | 
research on asset pricing; yet it belies the name of the field to study only 
returns and not to say anything about asset prices themselves. Many of the | 
most important applications of financial economics involve valuing assets, ` 
and for these applications it is essential to be able to calculate the prices | 
that are implied by models of returns. In this chapter we discuss recent 
rescarch that tries to bring attention back to price behavior. We deal with | 
common stock prices throughout, but of course the concepts developed in 
this chapter are applicable to other assets as well. 

The basic framework for our analysis is the discounted-cash-flow or present- | 
value model. This model relates the price of a stock to its expected future 
cash flows—its dividends—discounted to the present using a constant or 
time-varying discount rate. Since dividends in all future periods enter the | 
present-value formula, the dividend in any one period is only a small com- 
ponent of the price. Therefore long-lasting or persistent movements in div- | 
idends have much larger effects on prices than temporary movements do. } 
A similar insight applies to variation in discount rates. The discount rate 
between any one period and the next is only a small component of the 
long-horizon discount rate applied to a distant future cash flow; therefore 
persistent movements in discount rates have much larger effects on prices 
than temporary movements do. For this reason the study of asset prices is 
intimately related to the study of long-horizon asset returns. Section 7.1 
uses the presentvalue model to discuss these links between movements in 
prices, dividends, and returns. 

We mentioned at the end of Chapter 2 that there is some evidence for 
predictability of stock returns at long horizons. This evidence is statistically 
weak when only past returns are used to forecast future returns, as in Chap- 
tter 2, but it becomes considerably stronger when other variables, such as 
the dividend-price ratio or the level of interest rates, are brought into the 
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analysis. In Section 7.2, we use the formulas of Section 7.1 to help inter- 
pret these findings. We show how various test statistics will behave, both 
under the null hypothesis and under the simple alternative hypothesis that 
the expected stock return is time varying and follows a persistent fist-order 
autoregressive (AR(1)) process. A major theme of the section is that recent 
empirical findings using longer-horizon data are roughly consistent with this 
persistent AR(1) alternative model. We also develop the implications of the 
AR(1) model for price behavior. Persistent movements in expected returns 
have dramatic effects on stock prices, making them much more volatile than 
they would be if expected returns were constant. 

The source of this persistent variation in expected stock returns is an 
important unresolved issue. One view is that the time-variation in expected 
returns and the associated volatility of stock prices are evidence against the 
Efficient Markets Hypothesis (EMIT). But as we argued in Chapter 1, the 
EMH can only be tested in conjunction with a model of equilibrium returns. 
This chapter describes evidence against the joint hypothesis that the EMH 
holds and that equilibrium stock returns are constant, but it leaves open the 
pbssibility that a model with time-varying equilibrium stock returns can be 
constructed to fit the data. We explore this possibility further in Chapter 8. 


| 
| 
7.1 The Relation between Prices, Dividends, and Returns 


In this section we discuss the presentvalue model of stock prices. Using 
Ще identity that relates stock prices, dividends, and returns, Section 7.1.1 
presents the expected-presentvalue formula for a stock with constant ex- 
pected returns. Section 7.1.1 assumes away the possibility that there are 
S called rational bubbles in stock prices, but this possibility is considered in 
Section 7.1.2. Section 7.1.3 studies the general case where expected stock 
rdturns vary through time. The exact presentvalue formula is nonlinear in 
this case, but a loglincar approximation yields some useful insights. Sec- 
tibn 7.1.4 develops a simple example in which the expected stock return is 
tijne-varying and follows an AR(1) process. 

| We first recall the definition of the return on a stock given in Chapter 
lj The net simple return is 

H 


! Par +1) 
7 = tl ; 1 _ 
P 


1 


(7.1.1) 


This definition is straightforward, but it docs use two notational conventions 
that deserve emphasis. First, 2,4, denotes the return on the stock held from 
time / to time { J. The subscript £ + 1 is used because the return only 
becomes known at time { + 1. Second, P, denotes the price of a share of 
stock measured at the end of period t, or equivalently an ex-dividend price: 
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Purchase of the stock at price P, today gives one a claim to next period's 
dividend per share Dya but not to this period's dividend D;.! 

An alternative measure of return is the log or continuously compounded 
return, defined in Chapter Las 


ng = logd! + Ra). (7.1.2) 


Here, as throughout this chapter, we use lowercase letters to denote log 
variables, 


7.1.1 The Linear Present-Value Relation with Constant Expected Returns 


In this section we explore the consequences of the assumption that the 
expected stock return is equal to a constant 2: 


К.а = K. (7.1.3) 


Taking expectations of the identity (7.1.1), imposing (7.1.3), and rearrang- 
ing, we obtain an equation relating the current stock price to the next 
period's expected stock price and dividend: 


o J Paa + Dua 
„ = fp, ft |, 1.4 
п 1 L+R (84) 


This expectational difference equation can be solved forward by repeatedly 
substituting out future prices and using the Law of Ierated Expectations— 
the result that E, [Ey [X] = E, IXI. discussed in Chapter Ito eliminate 
future-dated expectations. After solving forward K periods we have 


K i K 
Ke : 1 
ED Gs н x) Das] +E, (x С 7) Рк |. (7.1.5) 


ix 


The second term on the right-hand side of (7.1.5) is the expected discounted 
value of the stock price K periods from the present. For now, we assume 
that this term shrinks to zero as the horizon A increases: 


1 K 
lim E, | Pax] = 0 (7.1.6) 
ы " 4 К H 4. 
K-00 L+R 
"These timing assumptions are standard in the finance literature, However some of the lit- 


erature on volatility tests, for example Shiller (1881) and Campbell and Shiller (1987, 1988a,b), 
uses the alternative timing convention that the stock price is measured at the begiuning ot the 
period or traded cumlividend.. Dillerences between the formulas given in this chapter and 
those in die original volatility papers are duc to this difference in timing conventions. 
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Assumption (7.1.6) will be satisfied unless the stock price is expected to 
grow forever at rate FR or faster, In Section 7.1.2 below, we discuss models 
of rational bubbles hat relax this assumption. 

Letting K increase in (7.1.5) and assuming (7.1.6), we obtain a formula 
expressing the stock price as the expected present value of future dividends 
out to the infinite future, discounted at a constant rate. For future conve- 
nience we write this expected present value as Pp: 


Zr )poy = 
Р, = Py m, у (ттк) Diri}. (7.1.7) 


ixl 


An unrealistic special case that nevertheless provides some useful in- 
tuition occurs when dividends are expected to grow at à constant rate С 
(which must be smaller than 1e to keep the stock price finite): 


VD m AON Dy, 4] = (+ GYD. (7.1.8) 


Substituting (7.1.8) into (7.1.7), we obtain the well-known "Gordon growth 
model" (Gordon [1962]) for the price of a stock with à constant discount 
rate Rand dividend growth rate С, where G < R: 


„ belli, Od COD, 
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(7.1.9) 


The Gordon growth model shows that the stock price is extremely sensitive 
toa permanent change in the discount rate when R is close to G, since the 
elasticity of the price with respect to the discount vate is (// dE, = 
./ G). 

lt is importini to avoid two common errors in interpreting these formu- 
Jas. First, note that we have made no assumptions about equity repurchases 
by firms. Equity repurchases affect the time pattern of expected future divi- 
dends per share in (7.1.7), but they do not affect the validity of the formula 
itself, Problem 7.1 explores this point in more detail. 

Second, the hypothesis that the expected. stock return is constant 
through time is sometimes known as the martingale modelof stock prices.” But 
à constant expected stock return does not imply a martingale for the stock 
price itself. Recall that à martingale for the price requires (Pa) = P, 
whereas (7.1.4) implies that 


Ра = bt NDP ЕР (7.1.10) 


“See Chapter 2 tm лса discussion of the martinyrale hypothesis, LeRoy (ВО) surveys 
che wan tinpale fiteratuie bom Samncelson (01965) on. More general martingale results lor 
riskeneutralized price processes are discussed in Chapter 9. 
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The expected stock price next period does not equal the stock price today 4s 
would be required if the stock price were a martingale; rather, the expected 
future stock price equals one plus the constant required return, (1 + R), 
times the current stock price, less an adjustment for dividend payments. 
To obtain a martingale, we must construct a portfolio for which all dividend 


payments are reinvested in the stock. At time 4, this portfolio will have N, 
shares of the stock, where 


D 
Nin = * ( 22). (7.1.10) 
Рад 
The value of this portfolio at time t, discounted to time 0 at rate R, is 
МР, 
. M, = L———. 7.1.12 
: (+R) ( ) 


It is straightforward to show that М, is a martingale. 

Even though the stock price P, is not generally a martingale, it will follow 
a linear process with a unit root if the dividend D, follows a linear process 
with a unit root.“ In this case the expected present-value formula (7.1.7) 
relates two unit-coot processes for P, and D.. It can be transformed to а 
relation between stationary variables, however, by subtracting a multiple of 
the dividend from both sides of the equation. We get 


p . E y l AB ; 7.1.13 
PTT R 8 mmy t+l+i}- (7.1.13) 


=0 


Equation (7.1.13) relates the difference between the stock price and 1/R 
times the dividend to the expectation of the discounted value of future 
changes in dividends, which is stationary if changes in dividends are station- 
ary. In this case, even though the dividend process is nonstationary and the 
price process is nonstationary, there is a stationary linear combination of 
prices and dividends, so that prices and dividends are cointegrated.° 


“In the special case where dividends are expected to grow at a constant rate G, this simplifies ! 
to ЕР = C+ GP. The stock price is expected to grow at the same rate as the dividend, | 
because the dividend-price ratio is constant in this case. 

‘Loosely, à variable follows a stationary time-series process if shocks to the variable have | 
temporary but not permanent effects, A variable follows a process with a unit root, also known | 
as an integrated process, if shocks have permanent effects on the level of the variable, but not 
on-the. change in the variable. In this case the first difference of the variable is stationary, but ` 
the level is not. A martingale is a unit-root process where the immediate effect of a shock is | 
the same as the permanent effect. See Chapter 2 or a textbook in time-series analysis such as 
Hamilton (1994) for precise definitions of these concepts. ; | 

“Twa variables with unit roots are cointegrated if some linear combination of the variables : 
is stationary. See Engle and Granger (1987) or Hamilton (1994) for general discussion, or | 
Campbell and Shiller (1987) for this application of the concept. Note that here the stationary | 


linear combination of the variables involves the constant discount rate R, which generally is 
not known a priori. 
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Although this formulation of the expected presentwalue model has 
been explored empirically by Campbell and Shiller (1987), West (1988b), 
and others, stock prices and dividends are like many other macroeconomic 
time series in that they appear to grow exponentially over time rather than 
linearly. This means that a linear model, even onc that allows for a unit root, 
is less appropriate than a loglincar model. Below we develop a presentvalue 
framework that is appropriate when dividends follow a loglinear process. 


9 7.1.2 Rational Bubbles 


Inf the previous section we obtained an expectational difference equation, 
(7,1.4), and solved it forward to show that the stock price must equal Рр, 
the expected present valuc of future dividends. The argument relied on 
the assumption (7.1.5) that the expected discounted stock price, K periods 
inithe future, converges to zero as the horizon K increases. In this section 
wa discuss models that relax this assumption. 

| The convergence assumption (7.1.5) is essential for obtaining a unique 
solution Pp, to (7.1.4). Ouce we drop the assumption, there is an infinite 
number of solutions to (7.1.4). Any solution can be written in the form 


| Р, = Py t+ В, (7.1.14) 
ud 
Ba 
= . Я 7.1.15 
| ш e [Ek] ‚ t,t) 
| 


The additional term В, in (7.1.14) appears in the price only because it is 
expected to be present next period, with an expected value (1 + A) times 
its current value. 

The term Pp, is sometimes called fundamental value, and the term B, 
is often called a rational bubble. The word “bubble” recalls some of the 
famous episodes in financial history in which asset prices rose far higher 
than could easily be explained by fundamentals, and in which investors 
appeared to be betting that other investors would drive prices even higher 
in the future.“ The adjective “rational” is used because the presence of 
B, in (7:1.14) is entirely consistent with rational expectations and constant 
expected returns. 


Mackay (1852) is a classic reference on сапу episodes such as the Dutch tulipmania 
in the 17th Century and the London South Sea Bubble aud Paris Mississippi Bubble in the 
18th Century. Kindleberger (1989) describes these aud other more recent episodes, while 
Garber (1989) argues that Dutch tilip prices were mote chsch relied to fundamentals than 
is commonly realized. 
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Jt is casiest to illustrate the idea of a rational bubble with an example. 
Blanchard and Watson (1982) suggest a bubble of the form 


E : ue 
=“) B+ Era, with probability r; 
Вад = ( 5 ' ib М | А (7.1.16) 
ai with probability | -- л, 


This obeys the restriction (7.1.15), provided that the shock F. i satisfies 
Каа = 0. The Blanchard and Watson bubble has a constant probability, 
I — x, of bursting in any period. If it does not burst, it grows at a rate 
TEN 1, faster than , in order to compensate for the probability of bursting. 
Many other bubble examples can be constructed; Problem 7.2 explores an 
example suggested by Froot and Obstfeld (1991), in which the bubble is a 
nonlinear function of the stock's dividend, 

Although rational bubbles have attracted considerable attention, there 
are both theoretical and empirical arguments that can be used to rule out 
bubble solutions to the difference equation (7.1.4). Theoretical arguments 
may be divided into partia-equilibrium arguments and general-equilibrium 
arguments. 

In partial equilibrium, the first point to note is that there can never be a 
negative bubble on an asset with limited liability. Wa negative bubble were 
to exist, it would imply a negative expected asset price at some date in the 
future, and this would be inconsistent with limited liability. A second im- 
portant point follows from this: A bubble on a limited-liability asset cannot 
start within an asset pricing model. Ht must have existed since asset trading 
began if it exists today. The reason is that if the bubble ever has a zero value, 
its expected future value is also zero by condition (7.1.15). But since the 
bubble can never be negative, it can only have a zero expectation if it is zero 
in the future with probability one (Diba and Grossman (1988)). 

Third, à bubble cannot exist if there is any upper limit on the price of 
ап asset. Thus a commodity-price bubble is ruled out by the existence of 
some high-priced substitute in infinitely clastic supply (for example, solar 
energy in the case of oil). Stock-price bubbles may be ruled out if firms 
impose an upper limit on stock prices by issuing stock in response to price 
increases. Finally, bubbles cannot exist on assets such as bonds which have 
a fixed value on a terminal date. 

Generalequilibrium considerations also limit the possibilities for ratio- 
nal bubbles. Tirole (1982) has shown that bubbles cannot exist in a model 
with a finite number of infinite-lived rational agents. The argument is casi- 
est to see when short sales arc allowed, although it does not in fact depend 
on the possibility of short sales. If à positive bubble existed in an asset 
infinite-lived agents could sell the asset short, invest some of the proceeds 
to pay the dividend stream, and have positive wealth teft over. This arbitrage 
opportunity rules out bubbles. 
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Tirole (1985) has studied the possibility of bubbles within the Diamond 
(1965) overlapping-generations model. In this model there is an infinite 
number of linitedived agents, but Tirole shows that even here a bubble 
cannot arise when the interest vate exceeds the growth rate of the economy, 
because the bubble would eventually become infinitely large relative to the 
wealth o the economy. This would violate some agents budget constraint 
Thus bubbles can only exist m dynamically inefficient overlapping-grenerations 
economies that have overaccumulated. private capital, driving the interest 
vate down below the growth rate of the economy. Many economists fecl 
that dynamic inefficiency is unlikely to occur in practice, and Abel, Mankiw, 
Summers, and Zeekhauser (L989) present empirical evidence that it does 
not describe the US economy. 

There are also some empirical arguments against the existence of bub- 
bles, The most important point is that bubbles imply explosive behavior of 
various series. In the absence of bubbles, if the dividend D, follows a lincar 
process with a unit root then the stock price P, has a unit root while the 
change in the price AP, and the spread between price and a multiple of 
dividends Pj = D,/ Rare stationary, With bubbles, these variables all have an 
explosive conditional expectation: ак, оо (О + RY) Хек] 0 for 
Nm Р, АР or) DAR. Empirically, there is little evidence of explosive 
behavior in these Series. A caveat is that stochastic bubbles are nonlinear, 
so standard linear methods may fail to detect the explosive behavior of the 
conditional expectation in these models. 

Finally, we note that rational bubbles cannot explain the observed pre- 
dictability of stock returns. Bubbles create volatility in prices withouw cre- 
ating predictability in returns. To the extent that price volatility can be 
explained by return predictability, the bubble hypothesis is superfluous. 

Although rational bubbles may be implausible, there is much to be 
learned from studying hem. An important theme of this chapter is that 
small movements in expected returns can have large effects on prices if they 
are persistent, Conversely, large persistent swings in prices can have small 
effects on expected returns in any one period. A rational bubble can be 
seen as the extreme case where price movements are so persistent—indecd, 
explosive—that they have no effects on expected returns at all. 


7.1.3 An Approximate Present-Value Relation with Time-Varying Expected Returns 


So far we have assumed that expected stock returns are constant. This 
assumption is analytically convenient, but it contradicts the evidence in 
Chapter 2 and in Section 7.2 that stock returns are predictable, 

It is much more difficult to work with present-value relations when ex- 
pected stock returns are Ginc-varying, lor then the relation between prices 
and returus becomes nonlinear. Que approach is to use a loglinear ap- 
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proximation, as suggested by Campbell and Shiller (1988a,b). The loglin- 
ear relation between prices, dividends, and returns provides an accounting 
framework: High prices must eventually be followed by high future divi- 
dends, low future returns, or some combination of the two, and investors! 
expectations must be consistent with this, so high prices must be associated 
with high expected future dividends, low expected future returns, or some 
combination of the two. Similarly, high returns must be associated with up- 
ward revisions in expected future dividends, downward revisions in expected 
future returns, or some combination of the two (Campbell (1991]). 

Thus the loglinear framework enables us to calculate asset price behav 
ior under any model of expected returns, rather than just the model with 
constant expected returns. The loglinear framework has the additional ad- 
vantage that it is tractable under the empirically plausible assumption that 
dividends and returns follow loglinear driving processes. Later in this chap- 
ter we use the loglinear framework to interpret the large empirical literature 
on predictability in long-horizon stock returns. 

The loglinear approximation starts with the definition of the log stock: 
return 744. Using (7.1.1), (7.1.2), and the convention that logs of variables 
are denoted by lowercase letters, we have 


mı log + Dua) —1ор(Р,) 
Par р log(1 + ехр(@+1 — pua)). (7.1.17) 


ll 


The last term on the right-hand side of (7.1.17) isa nonlinear function of the 
log dividend-price ratio, f(di41 — p1). Like any nonlinear function f (х1), 


it can be approximated around the mean of xi, x, using a first-order Taylor 
expansion: 


Sma) N ЈО) + f (D(x = X). (7.1.18) 

Substituting this approximation into (7.1.17), we obtain 
ny X k+ p fua + (Y р)да — bi (7.1.19) 
where p and k are parameters of linearization defined by р = 1/(1 + 


exp(d — p)), where (d — p) is the average log dividend-price ratio, and k = 
— log(p) — (1 — p)log(1/p — 1). When the dividend-price ratio is constant, 
then p = 1/(1 + D/P), the reciprocal of one plus the dividend-price ratio. 
Empirically, in US data over the period 1926 to 1994 the average dividend- 
price ratio has been about 4% annually, implying that p should be about 0.96 
in annual data, or about 0.997 in monthly data. The Taylor approximation 
(7.1.18) replaces the log of the sum of the stock price and the dividend in 
(7.1.17) with a weighted average of the log stock price and the log dividend 
in (7.1.19), The log stock price gets a weight p close to one, while the log 
dividend gets a weight 1 — p close to zero because the dividend is on average 
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nuch smaller than the stock price, soa given proportional change in the div- 
dend has a much smaller effect on the return than the same proportional 


€hange in the price. 


Approximation Accuracy 

he approximation (7.1.19) holds exactly when the log dividend-price ratio 
is constant, for then dı, and Pis 1 move together one-for-one and equation 
(7. 1.19) is equivalent to equation (7.1.17). Like any other Taylor expansion, 
the approximation (7.1.19) will be accurate provided that the variation in 
he log dividend-price ratio is not too great. One can get a sense for the 
ccuracy of the approximation by comparing the exact return (7.1.17) with 
he approximate return (7.1.19) in actual data. Using monthly nominal 
dividends and prices on the CRSP value-weighted stock index over the pe- 
iod 1926:1 to 1994:12, for example, the exact and approximate returns 
have means of 0.78% and 0.72% per month, standard deviations of 5.55% 
qnd 5.56% per month, and a correlation of 0.99991. The approximation 
drror—the difference between the approximate and the exact return—has 
4 mean of —0.06%, a standard deviation of 0.08%, and a correlation of 0.08 
With the exact return. Using annual nominal dividends and prices on the 
CRSP value-weighted stock index over the period 1926 to 1994, the exact 
and approximate returns have means of 9.20% and 9.03% per year, standard 
deviations of 19.29% and 19.42% per year, and a correlation of 0.99993, The 
approximation error has a inean of 0.17%, a standard deviation of 0.26%, 
and a correlation of 0.51 with the exact return. Thus the approximation 
misstates the average stock return but captures the dynamics of stock returns 
well, especially when it is applied to monthly data.” 


— 


ES 


Implications for Prices 

Equation (7.1.19) is a lincar difference equation for the log stock price, 
analogous to the linear difference equation for the level of the stock price 
that we obtained in (7.1.4) under the assumption of constant expected 
returns. Solving forward and imposing the condition that 


lim pp, = 0, (7.1.20) 
17% 


we obtain 


k со 
һ = I + оа = pua, = тала). (7.1.21) 


у= 


"One can also compare exact and approximate real returns, The correction for inflation 

Bas no important effects on the comparison, See Campbell and Shiller (19882) for à more 
detailed evaluation of approximation accuracy at short and long horizons. 
А i 


i 
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Equation (7.1.21) is a dynamic accounting identity; it has been obtained 
merely by approximating an identity and solving forward subject to a termi- 
nal condition. The terminal condition (7.1.20) rules out rational bubbles 
that would cause the log stock price to grow exponentially forever at rate 1/p 
or faster. Equation (7.1.21) shows that if the stock price is high today, then 
there must be some combination of high dividends and low stock returns 
in the future.” 

Equation (7. 1.21) holds ex post, but it also holds ex ante, Taking expec- 


tations of (7.1.21), and noting that pi = Ej py] because fr is known at time 
f, we obtain 


оо 
h = — +E, Y eta m p)d as, — 4. (7.1.22) 
Ј=0 


This should be thought of as a consistency condition for expectations, anal- 
ogous to the statement that the expectations of random variables X and Y 
shoud add up to the expectation of the sum X + Y. If the stock price is high 
today, then investors must be expecting some combination of high future 
dividends and low future returns. Equation (7.1.22) isa dynamic generaliza- 
tion of the Gordon formula for a stock price with constant required returns 
and dividend growth. Campbell and Shiller (1988a,b) call (7.1.22) —and 
(7.1.24) below—the dynamic Gordon growth model or the dividend-ratio model. 

Like the original Gordon growth model, the dynamic Gordon growth 
model says that stock prices are high when dividends are expected to grow 
rapidly or when dividends are discounted ata low rate; but the effect. on 
the stock price ofa high dividend growth rate (ora low discount rate) now 
depends on how long the dividend growth rate is expected to be high (or 
how long the discount rate is expected to be low), whercas in the original 
model these rates are assumed to be constant at their initial levels forever, 
One can use the definitions of p and k to show that the dynamic Gordon 
growth model reduces to the original Gordon growth model when dividend 
growth rates and discount rates are constant. 


For future convenience, we can simplify the notation in (7.1.22), rewrit- 
ing it as 


Pe = < + pu — pa. (7.1.23) 
where pu is the expected discounted valne of (1 р) times future log div- 
klends in (7.1.22) and Ри is the expected discounted value of future log 
stock returns. This parallels the notation we used for the constant-expected- 


return case in Section 7.1.1. 


“Campbell and Shiller (19883) evaluate the санасу of the Approximation in (7.1.21). 
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Equation. (7.1.22) can be rewritten in terms of the log dividend-price 
ratio rather than the log stock price: 


К ` 
d, — p = "cc elt So pl dan + nas) . (7.1.24) 


р ار‎ 


The log dividend-price ratio is high when dividends are expected to grow 
only slowly, or when stock returns are expected to be high. This equation 
is useful when the dividend follows a loglinear unit-root process, so that 
log dividends and log prices аге nonstationary, In this case changes in 
log dividends are stationary, so from (7.1.24) the log dividend-price ratio 
is stationary provided that the expected stock return is stationary, Thus 
log stock prices and dividends are cointegrated, and the stationary linear 
combination of these variables involves no unknown parameters since it is 
just the log ratio. This simple structure makes the loglinear model easier to 
use in empirical work than the linear cointegrated model (7.1.13). 

So far we have written asset prices as linear combinations of expected 
future dividends and returns; We can use the same approach to write asset 
returus as linear combinations of revisions in expected future dividends and 
returns (Campbell 1991р). Substituting (7.1.22) into (7.1.19), we obtain 


TM оо 
пар mil = Fou Уо! Айда, - E, Joi Ads, 
y= y= 
м со 
= Eaa Dory |= SJ nas) (7.1.95) 
"T m 


This equation shows that unexpected stack returns tust be associated with 
changes in expectations of future dividends or real returns, An increase in 
expected future dividends is associated with a capital gain today, while an 
increase in expected future returns is associated with a capital loss today. 
The reason is that with a given dividend stream, higher future returns can 
only be generated by future price appreciation from а lower current price. 
For convenience, we can simplify the notation of (7.1.25) to 


niic mil = Mı = una = Hua (7.1.26) 


where yyy is the unexpected stock return, 24,44 is the change in expecta- 
tions of future dividends in (7.1.25), and р, 4 is the change in expectations 
of future returns, 


7.1.4 Prunes and Returns in a Simple Example 


The formulas developed in the previous section may be easier to understand 
in the context of a simple example. Later we will argue that the example 
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is not only simple, but also empirically relevant. Suppose that the expected 
log stock return is a constant r plus an observable zero-mean variable x: 


Eln) = r+ x. (7.1.27) 


We further assume that x, follows the first-order autoregressive (AR(1)) 
process 


Xii = Ox, + & +1. -l<@ < 1. (7.1.28) 
When the AR coefficient ф is close to one, we will say that the х; process 
is highly persistent. Equation (7.1.28) implies that the variance of x, and 
its innovation Ei, which we write as o? and оѓ respectively, are related by 
od = (1 900%. 


Under these assumptions, it is straightforward to show that 


oo 
К T Xt 
„ mE 1 ‚| = з 7.1.99 
Pri t dP Tj Va Loe ( ) 


Equation (7.1.29) gives the effect on the stock price of variation through 

time in the expected stock return. The equation shows that a change in the: 

expected return has a greater effect on the stock price when the expected: 

return is persistent: Since p is close to one, a 1% increase in the expected! 

return today reduces the stock price by about 2% if ф = 0.5, by about 4% if 

ф = 0.75, and by about 10% if ¢ = 0.9. Е 
This example illustrates an important point. The variability of expected | | 

stock returns is measured by the standard deviation of xi. If this standard | 

deviation is small, it is tempting to conclude that changing expected returns 

have little influence on stock prices, in other words, that variabillty in P., 

is small. Equation (7.1.29) shows that this conclusion is too hasty: The . 

standard deviation of py, is the standard deviation of x, divided by (1 — рф), | 

so if expected returns vary in a persistent fashion, f, can be very variable 

even when x, itself is not. This point was stated by Summers (1986), and | i 

particularly forcefully by Shiller (1984): 


Returns on speculative assets are nearly unforecastable; this fact is me Я 
basis of the most important argument in the oral tradition against a role 4 
for mass psychology in speculative markets. One form of this argument 
claims that because real returns are nearly unforecastable, the real price 
of stocks is close to the intrinsic value, that is, the present value with 
constant discount rate of optimally forecasted future real dividends. 


This argument... is one of the most remarkable errors in the history 
of economic thought. 


In our example the stock price can be written as the sum of two terms. 
The first term is the expected discounted value of future dividends, рас; this 
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is not quite a random walk for the reasons given in Section 7.1.1 above, but 
[i is close to a random walk when the dividend stream is not too large or 
variable. The second term is a stationary АК(1) process, - This two- 
component description of stock prices is often found in the literature (see 
Summers [1986], Fama and French [1988b], Poterba and Summers [1988], 
and Jegadeesh [1991]). 

The AR(1) example also yields a particularly simple formula for the 
one-period stock return 74). The general stock-return equation (7.1.25) 
simplifies because the innovation in expected future stock returns, 1. is 
| given by р&+1/(1 — рф). Thus we have 


йы = TEX + aad — iL. (7.1.30) 
1—рф 

To understand the implications of this expression, assume for simplicity 

that news about dividends and about future returns, та. (+1 апа Ёр are 

uncorrelated. Then using the notation Var[n4,44] = ol (so aj represents 

the variance of news about all future dividends, not the variance of the 

current dividend), and using the fact that of =(1- G) OE, we can calculate 
the variance of i as 


D 9g? 
Varl! = of +02 8 x а? + 2 Я (7.1.31) 
(1 ¬ фо)? 1-6 
where the approximate equality holds when $ « p and p is close to onc. 
Persistence in the expected return process increases the variability of real- 
ized returns, for sinall but persistent changes in expected returns have large 
effects on prices and thus on realized returns. 

Equations (7.1.28) and (7.1.30) can also be used to show that realized 
stock returns follow an ARMA(1,1) process and to calculate the autocorre- 
lations of this process. There are offsetting effects: The positive autocorre- 
lations of expected returns in (7.1.28) appear in realized returns as well, but 
a positive innovation to future expected returns Causes a contemporancous 
capital loss, and this introduces negative autocorrelation into realized re- 
turns. In the АКМА (1,1) representation the AR coefficient is the positive 
persistence parameter $, but the MA coefficient is negative. Problem 7.3 
explores these effects in detail, showing that the latter effect dominates pro- 
vided that ¢ < o. Thus there is some presumption that changing expected 
returns create negative autocorrelations in realized returns. 

Problem 7.3 also generalizes the example to allow for a nonzero covari- 
ance between dividend news and expected-return news. Stock returns can 


“This might be the case, for example, if expected returns are determined by the volatility 
of the dividend growth process, and dividend volatility is driven by a GARCH model of the type 
discussed in Chapter 12 so that shocks to volatility are uncorrelated with shocks to the level of 
dividends, 
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be positively autocorrclated if dividend news and expected-return news have 
à sufficiently large positive covariance. The covariance between dividend 
news and expectedreturn news can also be chosen to make stock returns 
serially uncorrelated. "his case, in which stock returns follow a serially 
uncorrelated white noise process while expected stock returns follow a per- 
sistent AR(1) process, illustrates the possibility that an asset market may be 
weak-form efficient (returns are unforecastable from the history of returns 
themselves) but not scmistrong-form efficient (returns are forecastable from 
the information variable лу). 

This possibility seems to be empirically relevant for the US stock market 
The statistically insignificant long-horizon autocorrelations reported at the 
end of Chapter 2 imply that there is only weak evidence for predictability of 
long-horizon stock returns given past stock returns; but in the next section 
we show that there is stronger evidence for predictability of long-horizon 
returns given other information variables, 
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We now use the identities discussed in the previous section to interpret 
recent empirical findings on the time-series behavior of US stock prices, 
Section 7.2.1 discusses empirical work that predicts stock returns over long 
horizons, using forecasting variables other than past returns themselves, We 
present illustrative empirical results when dividend-price ratios and interest 
rate variables are used to forecast Stock returns, Section 7.2.2 relates long- 
horizon return behavior to price behavior, in particular stock price volatility. 
Section 7.2.3 shows how time-series models can be used to calculate the long- 
horizon implications of shorthorizon asset market behavior. 


7.2.1 Long-Horizon Regressions 


Recently there has been much interest in regressions of returns, measured 
over various horizons, onto forecasting variables, Popular forecasting vari- 
ables include ratios of price to dividends or carnings (see Campbell and 
Shiller [1988a,b], Fama and French [1988a], Hodrick [1992], and Shiller 
[1984]), and various interest rate measures such as the yicld spread be- 
tween long- and short-terin rates, the quality yield spread between low- and 
high-grade corporate bonds or commercial paper, and measures of recent 
changes in the level of short rates (sec Campbell [1987], Fama and French 
[1989], Hodrick [1992], and Keim and Stambaugh [1986]). 

Here we concentrate on the dividend-price ratio, which in US data is the 
most successful forecasting variable for long-horizon returns, and on ashort- 
term nominal interest-rate variable. We start with prices and dividends on 
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the value-weighted CRSP index of stocks traded on the NYSE, the AMEX, and 
the NASDAQ. The dividend-price ratio is measured as the sum of dividends 
paid on the index over the previous year, divided by the current level of the 
index; summing dividends over a full year removes any seasonal patterns in 
dividend payments, but the current stock index is used to incorporate the 
most recent information in stack prices. % 

The interest-rate variable is a transformation of the one-month nominal 
US Treasury bill rate motivated by the fact that unitroat tests often fail 
to reject the hypothesis that the bill rate has a unit root. We subtract a 
backward one-year moving average of past bill rates from the current bill rate 
to get a stochastically ert interest vate that is equivalent to a triangularly 
weighted moving average of past changes in bill rates, where the weights 
decline as one moves back in time, Accordingly the detrended interest rate 
is stationary if changes in bill rates are stationary. This stochastic detrending 
method has been used by Campbell (1991) and Hodrick (1992). 

Table 7.1 shows a typical set of results when the dividend-price ratio 
is used to forecast returns. The table reports monthly regressions of log 
real stock returns onto the log of the dividend-price ratio at the start of the 
holding period. Returns are measured over a holding period of & months, 
which ranges from one month to 48 months (four years); whenever K > 1, 
the regressions use overlapping monthly data. Results are reported for the 
period 1927 to 1994 anc also for subsamples 1927 to 1951 and 1952 to 1994. 
For each regression Table 7.1 reports the R? statistic and the statistic for 
the hypothesis that the coefficient on the log dividend-price ratio is zero. 
The tstatistic is corrected for heteroskedasticity and serial correlation in 
the equation error using the asymptotic theory discussed in the Appendix. 
Table 7.1 follows Funa and French (10883) except that the regressor is tie 
log dividend-price ratio rather than the level of the dividend-price ratio 
(a change which makes very litle difference to the results), overlapping 
monthly data arc used for all horizons, and the sample periods are updated. 
Although the results in the table are for real stock returns, almost identical 
results are obtained for excess returns over the one-month Treasury bill rate. 

At a horizon of one month, the regression results in Table 7.1 are rather 
unimpressive: The IP statistics never exceed 2%, and the Statistics exceed 
2 only in the pos- World War H subsample. The striking fact about the table 
is how much stronger the results become when one increases the horizon 
K. Ла two-year horizon the R? statistic is 14% for the full sample, 22% for 
the prewar subsample, and 32% for the postwar subsample; at a four-year 
horizon the statistic is 26% for the full sample and 42% for cach of the 
subsamples. In the full sample and the prewar subsample the regression t 


This wav of measuring the dividend-pii e ratio is манаш in the academic literature. and 
Wis also commonly used iu the financial азах 
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Table 7.1. — Long-horizan regressions of log stock returns on the log dividend-price ratio. ; 


] 
naa cb neg = ВОК), — р) + Mane | 


Forecast Horizon (К) : 


І 3 1? 24 36 48 
| 
1927 to 1994 | 
BUR) 0.012 0.044 0.191 0.383 0.528 0.654 
ICQN) 0.004 0.015 0.068 0.144 0.209 0.267 4 
HACKY) 1.991 1.400 — 2.070 4.1123 4,631 3.943 
1927 io 1951 | 
BOO 0.015 0.059 0.274 0.629 0.880 1.050 $ 
RK) 0.003 0.014 0.074 0.207 0.322 0.424 
BOO) 0.660 0.844 1.677 4.521 2.967 3.783 
1952 to 1904 
BOO 0.04 0.079 0.329 0.601! 0.776 0.863 
IUK) 0.059 0.047 0.190 0.344 0.498 0.439 
nO 2.733 3.055 3.228 3.225 3.315 3.561 


ris the log real return on a value-weighted index of NYSE, AMEX, and NASDAQ stocks. (d — p) 
is the log ratio of dividends over the last year to the current price. Regressions are estimated 
by OLS, with Hansen and Hodrick (1980) standard errors, calculated from equation (А.3.3) 
in the Appendix setting autocovariances beyond lag A — 1 to zero. Newey and West (1987) 
standard errors with g = K — 1 or <= AK ¬ 1) are very similar and typically are slightly smaller 
than those reported in the table. 


statistics also increase dramatically with the forecast horizon, although they 
are fairly stable within the range 3.0 to 3.5 in the postwar subsample. 

Jt is interesting to compare the results in Table 7.1 with those obtained 
when stock returns are regressed onto the stochastically detrended short- 
term interest rate in Table 7.2. The regressions reported in Table 7.2 are 
run in just the same way as those in Table 7.1. Once again almost identical 


results are obtained if real returns are replaced by excess returns over the 
one-month Treasury bill rate. 


Table 7.2 shows that, like the dividend-price ratio, the stochastically 
detrended short rate has some ability to forecast stock returns. However this 
forecasting power is very different in two respects. First, itis concentrated in 
the postwar subsample; this is not surprising since short-term interest rates 
were pegged by the Federal Reserve during much of the 1930s and 1940s, 
and so the detrended short rate hardly varies in these years. Second, the 
forecasting power of the short rate is at much shorter horizons than the 
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Table 7.2. Lony-hurizon regressions of log stock returns on the stochastically detrended short- 
term interest rate. 


пы + rk = PKR — NU ios 19) Tak 


Forecast Horizon (А) 


] 3 12 24 3 Ii 
1927 to 1994 . 
PUO —5.468  —17.181  —41.663 —4.492  —96.148 — —20.129 
(Ку 0.00% 0.016 0.023 0.000 0.004 0.002 
UBCK)) —2.299  -2.582  -—1.564 —0.164  -—1.341' 0.838" 
31927 to 1951 
| B(K) 3.144 -6.183 73.712 158.989  —67.505 — —50.900 
| 1 (K) 0.000 0.000 0.012 0.031 0.005 0.002 
TENICI 0.222 —0.165 0.520 1.662 —0.637*  —0.580* 
1 
: 1952 to 1994 
À BU) —6.547 -—18.621 —56.406 —26.115  —26.573 —25.894 
! RNK) 0.019 0.047 0.103 0.013 0.010 0.008 
| (BK) -3.263 —3.206 | —2.74] 1.354  -—1.555'* 1.092 


is the log real return on a value-weighted index of NYSE, AMEX, and NASDAQ stocks, yie 
i$ the l- month nominal Treasury bill rate, Regressions ave estimated by OLS, with Hansen 
0 Hodrick (1980) standard errors, calculated from equation (A.3.3) in the Appendix setting 
(f= Dae ws beyond lag KI to zero. Newey and West (1987) standard errors, with q = 
f 


~ 


к — 1), are used when the Hansen and Hodrick (1980) covariance matrix estimator is not 
»sitive definite. The cases where this occurs are marked ж. 


recasting power of the dividend-price ratio. The postwar /i? statistics are 
comparable to those in Table 7.1 at horizons of one or three months, but 
they peak at 0.10 at a horizon of onc year and then rapidly decline. The 
regression (statistics are likewise insignificant beyond a one-year horizon. 
How can we understand the hump-shaped pattern of R? statistics and 
t-statistics in Table 7.2 and the strongly increasing pattern in Table 7.1? At 
one level, the results in Table 7.1 can be understood by recalling the formula 
relating the log dividend-price ratio to expectations of future returns and 
dividend growth rates, given above as (7.1.24): 


о 


d, — p = V, oe! [Ашык + raj] 


J70 


This expression shows that the log dividend-price ratio will be a good proxy 
for market expectations of future stock returns, provided that expectations 
of future dividend growth rates arc not too variable. Moreover, in general 
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the log dividend-price ratio will be a better proxy for expectations of long- 
horizon returns than for expectations of shorthiorizou returns, because 
the expectations on the right-hand side ot (7.1.24) ave. of a discounted 
value of all returns into the infinite future. This may help to explain the 
improvement in forecast power as the horizon increases in Table 7.1. 

Even in the absence of this effect, however, it is possible to obtain results 
like those in Tables 7.1 and 7.2. To sce this we now return to our AR(1) 
example in which the variable x, а perfect proxy for the expected stock 
return at any horizon, is observable and can be used as a regressor by the 
cconometrician. Problem 7.4 develops a structural model of stock prices 
and dividends in which a multiple of the log divideud-price ratio has the 
properties of the variable x, in the AR(1) example. 

We use the AR(I) example to show that when x is persistent, the n 
of à return regression on x, is very small at a short horizon; as the horizon 
increases, the / (ist increases and then eventually decreases. We also 
discuss linite-sample difficulties with statistical inference in long-horizon 
regressions. 


R? Siatistics 

First consider regressing the one-period return тар on the variable x. For 
simplicity, we will ignore constant terms since these are not the objects of 
interest; constants could be included in the regression, or we could simply 
work with demeaned data, In population, Д(1) = 1, so the fitted value 
is just x, itself, with variance c1, while the variance of the return is given 
by equation (7.1.31) above. 1t follows that the one-period regression R? 
statistic, which we write as 7 (J), is 


в -1 
T Var (5 un 2 ) 
CVP 7.2. 

a Var( 41] E | 1-6 n 


where for simplicity we are using the approximate version of (7.1.31) that 
holds when @ < pand pis close to one. КЁ (1) reaches an upper bound of 
(1 — 9)/2 when the variability of dividend news, oj, is zero. ‘Thas even when 
a stock is effectively a real consol bond with known real dividends, so that 
all variation in its price is due to changing expected returns, the onc-period 
R? statistic will be small when $ is large. The reason is that innovations to 
expected returns cause large unforecastable changes in stock prices when 
expected returns are persistent. 

‘The behavior of the f statistic in along-horizon regression is somewhat 
more complicated. A regression with horizon K takes the form 


EI Nom PON) AK. (7.2.2) 


In the AR(1) example, the best forecast of the ouc-period return j periods 
ahead is always xı] = QU ху. The best forecast of the cumulative 


4. reent- vate Eelations 


return over А months is found by summing the forecasts of one-period 
returns up to horizon A, so B(K) = Otot +o = (1-990 (l-). 
The Л? statistic for the K-period regression is given by 


" Var [Ер tee bE [rae S 
RK) = Li nal i ixl (7.2.3) 
Varpa Fe ral 


Dividing by the one-period R? statistic and rearranging, we obtain 


КЧ К) A к ты] ++ d 
Var Ld raa] 


Var[n 
(gettin) em 
Var[ rna b rob пък] 


The first ratio on the right-hand side of (7.2.4) is just the square of 
the A-period regression coefficient divided by the square of the one-period 
repression coefficient. Inthe ARCD example this is (ФАО фу which 
is approximately equal to A7 [for large ф and small А. The second ratio on 
the righthand side of (7.2.4) is closely related to the variance ratio discussed 
in Chapter 2. In fact, it cii be rewritten as 1/(KV(A)), where VK) is the K- 
period variance ratio for stock returns, In the AR(I) example, Problem 7.3 
shows that the autocorrelatious of stock returns are all negative. It follows 
that V(K) < Tso t/(KV(K)) > HK. 

Putting the two terms on the right-hand side of (7.2.4) together, we find 
that if expected stock returns are very persistent, the multiperiod R? statistic 
grows at first approximately in proportion to the horizon K. This behavior 
is well illustrated by the results in Table 7.1. Intuitively, it occurs because 
forecasts of expected. returns several periods ahead are only slightly less 
variable than the forecast of the next period's expected return, and they 
are perfectly correlated with it, Successive realized returns, on the other 
hand, are slightly negatively correlated with one another. Thus at first the 
variance of the multipeviod fitted value grows more rapidly than the variance 
of the multiperiod realized return, increasing the multiperiod R? statistic. 
Eventually, of course, forecasts of returis in the distant future die out so the 
first ratio on the righthand side of (7.2.4) converges to a fixed limit; but 
the variability of realized multiperiod returns continues to increase, so the 
second ratio on the right-hand side of (7.2.4) becomes proportional to H/K. 
Thus eventually multiperiod f statistics go to zero as the horizon increases. 

lt may be helpful to give an even more explicit formula for the ARC) 
example in the case where the long horizon is just two periods, that is 


where K = 2. fn this case tedious but straightforward calculations and the 
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simplifying approximation that holds when ¢ < р and p is close to one 
yield 


R?(2) А 2+ (1 — ф) (02/0?) 
EO 20. ا‎ | 7.25 
(1) oe 2(1 + ¢) + 2(1 — ф) (02/02) e 


The ratio in (7.2.5) approaches (1 + ф) as %? approaches zero, so a t o- 
period regression may have an R? statistic almost twice that of a one-period 
regression if expected returns are persistent and highly variable. On the 
other hand the ratio approaches (1 + $)?/2 as 02/0? approaches infinity, so 
a two-period regression may have an R? statistic only half that of a one-period! 
regression if expected returns have only small and transitory variation. 

Calculations for horizons beyond two periods become very messy, but 
Campbell (1993b) reports some numerical results. When ф = 0.98, p S 
0.995, and 02/02 = 0, for example, a one-period regression has ап R?' 
statistic of only 1.5%, but the maximum R? is 63% for a 152-period regres-, 
sion. When the forecasting variable is highly persistent, the А? statistic can: 
continue to rise out to extremely long horizons. 


Difficulties with Inference in Finite Samples i 
The t-statistics reported in Tables 7.1 and 7.2 are based on the asymptotic ! 
theory summarized in the Appendix. There are however a number of pitfalls b 
in applying this theory to regressions of returns onto the information vari- 
able x. 

A first problem arises from the fact that in the regression of the one- 
period return лы on x, ni = ВО) х + nua, the regressor х; is correlated 
with past error terms H for i > 0, even though it is not correlated with 
contemporaneous or future error terms N+, These correlations exist be- 
cause shocks to the state variable x; are correlated with shocks to returns, and 
the variable x, is persistent. In the language of econometrics, the regressor 
x, is predetermined, but it is not exogenous. This leads to finite-sample bias in 
the coefficient of a regression of returns on x. In the AR(1) example, there 
is a simple formula for the bias when the regression horizon is one period: 


^ 1+3 1+3 
sim pay = (E) „ гб” ago 


The term —(1 30 T is the Kendall (1954) expression for the bias in the 
OLS estimate of the persistence parameter ¢ obtained by regressing xj 
on x, As Stambaugh (1986) has shown, this bias leads to a bias in the OLS 
estimate of the coefficient 8(1) when the return innovation n: covaries 
with the innovation in the forecasting variable £,41. In our simple example 
with uncorrelated news about dividends and future returns driving current 
returns, the ratio /, =  —p/(1 — pd), which produces the second 
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equality in (7.2.6). This bias can be substantial: With p = 0.997, for 
example, it equals 36/7 when ¢ = 0.9, 73/ T when ф = 0.95, and 171/ 7 
when ф = 0.98.“ 

A second problem is that the asymptotic theory given in the Appendix 
thay be misleading in finite samples when the horizon K is large relative 
tb the sample size. Hodrick (1992) and Nelson and Kim (1993) use Monte 
Carlo methods to illustrate this for die case where stock returns are regressed 
dn dividend-price ratios, Richardson and Stock (1989) show that the finite- 
sample properties of regressions with large K cau be accounted for using an 
alternative asymptotic theory in which K increases asymptotically at the same 
js as the sample size. Section 2.5.1 of Chapter 2 discusses the application 
of their theory to univariate regressions of returns on past returns. 

One way to avoid this problem is to transform the basic regression so 
that it no longer has overlapping residuals. This has been. proposed by 
Jegadcesh (1991) for the Fama and French (1988b) regression of returns 
oh lagged returns, and it bas been advocated more generally by Cochrane 
(1991). For example, we might estimate 


} 

| naa = VARY Y ¥ AiK) I. (7.2.7) 

| 
where the error term uppg is now serially uncorrelated. The numerator 
of the regression coefficient y (K) in (7.2.7) is the same as the numerator 
of the regression coefficient В(К) in (7.2.2), because the covariance of x 
measured at one date and r measured at another date depends only on the 
difference between the two dates. Hence y(K) = 0 in (7.2.7) if and only 
if B(K) = 0 in (7.2.2). However it does not necessarily follow that tests 
of y(K) = O and B(K) = 0 have the same asymptotic properties under 
the null or general alternative hypotheses. Hodrick (1992) presents Monte 
Carlo evidence on the distributions of both kinds of test statistics; he finds 
that they both tend to reject the null too often if asymptotic critical values 
are used, so that the long-horizon “statistics reported in Tables 7.1 and 7.2 
should be treated with caution, However he also finds that these biases 
are not strong enough to account for the evidence of return predictability 
reported in the tables. 

An important unresolved question is whether there are circumstances 
under which long-horizon regressions have greater power to detect devia- 
tions from the null hypothesis than do short horizon regressions, Hodrick 
(1992) and Mark (1995) present some suggestive Monte Carlo evidence that 
this may be the case, and Campbell (19930) also studies the issue, but the 
literature has not reached any firm conclusion at this stage. 


“similar biases alflict the regression when the horizon is greater than one period, See 
Hodrick (1002) and Mark (1999) for Monte Carlo evidence on this point. 
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7.2.2 Volatility Tests 


In the previous section we have explored regressions whose dependent vari- 
ables are returns measured over long horizons. One motivation for such 
regressions is that asset prices are influenced by expectations of returns into 
the distant future, so long-horizon procedures are necessary if we are to un- 
derstand price behavior. We now turn to empirical work that looks at price 
variability more directly. 

LeRoy and Porter (1981) and Shiller (1981) started a heated debate in 
the early 1980s by arguing that stock prices are too volatile to be rational 
forecasts of future dividends discounted at a constant rate. This contro- 
versy has since died down, partly because it is now more clearly understood 
that a rejection of constant-discount-rate models is not the same as a rejec- 
tion of the Efficient Markets Hypothesis, and partly because regression tests 
have convinced many financial economists that expected stock returns are 
time-varying rather than constant. Nonetheless the volatility literature has 
introduced some important ideas that are closely connected with the work 
on multiperiod return regressions discussed in the previous section. Useful 
surveys of this literature include Gilles and LeRoy (1991), LeRoy (1989), 
Shiller (1989, Chapter 4), and West (1988a). 

The early papers in the volatility literature used levels of stock prices 
and dividends, but here we restate the ideas in logarithmic form. This is 
consistent with the more recent literature and with the exposition in the 
rest of this chapter. We begin by defining a log perfect-foresight stock price, 


со 
= nd = Pda bh ~ r]. (7.2.8) 
J=0 


The perfectforesight price p? is so named because from the ex post stock 
price identity (7.1.21) it is the price that would prevail if realized returns 
were constant at some level r, that is, if there were no revisions in expecta- 
tions driving unexpected returns. Equivalently, from the ex ante stock price 
identity (7.1.22) itis the price that would prevail if expected returns were 
constant and investors had perfect knowledge of future dividends. Substi- 
tuting (7.2.8) into (7.1.21), we find that 


со 
Bop = L — r). (7.2.9) 
j=0 
The difference between p? and p is just à discounted sum of future de- 
meaned stock returns. 
If we now take expectations and use the definition given in (7.1.22) and 
(7.1.23) of the price component р, we find that 


EMI р = pon с = ра Кр]. (7.2.10) 
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Recall that hu can be interpreted as that component of the stock price which 
is associated with changing expectations of future stock returns. Thus the 
conditional expectation of pf — p, measures the effect of changing expected 
stock returns on the current stock price. In the AR(1) example developed 
earlier, the conditional expectation of pr — p, is just x; /(1 —p$) from (7.1.29), 

expected stock returns are constant through time, then the right-hand 
side of (7.2.10) is zero. The coustantexpected-return hypothesis implies 
that /, ру is a forecast error uncorrelated with information known at time 
t. Equivalently, it implies that the stock price is a rational expectation of the 
perlectforesight stock price: 


hi = EL py. (7.2.11) 


How can these ideas be used to test the hypothesis that expected stock 
returns are constant? For simplicity of exposition, we begin by making two 
unrealistic assumptions: first, that log stock prices and dividends follow sta- 
tionary stochastic processes, so that they have well-defined first and second 
moments; and second, that log dividends are observable into the infinite 
future, so that the perfeccforesight price ſi is observable to the econome- 
trician, Below we discuss how these assumptions are relaxed. 


Orthogonality and Variance-Bounds ‘Tests 
Equation (7.2.1 1) implies that ре pris orthogonal to information variables 
known at time /. An orthogonality test of (7.2.1 1) regresses pt — hi onto 
information variables and tests for zero coefficients. If the informatio.i 
variables include the stock price /n itself, this is equivalent toa repression of 
т onto hi and other variables, where the hypothesis to be tested is now that 
hi hasa unit coefficient and the other variables have zero coefficients. These 
regressions are variants of the long-horizon return regressions discussed in 
the previous section, Equation (7.2.9) shows that = р is just a discounted 
sum of future demeaned stock returns, so an orthogonality test of (7.9.11 ) 
is a return regression with an infinite horizon, where more distant returns 
are geometrically downweighted,!2 

Instead of testing orthogonality directly, much of the literature tests the 
implications of orthogonality for the volatility of stock prices. The most 
баои such implication, derived by LeRoy and Porter (1981) and Shiller 
(1981), is the variance inequality for the stock price: 


Vulp] = Маг p, | + Маг — pl > Var| fy]. (7.2.12) 


“The downweighiing allows the f statistic in the regression to be positive, whereas we 
showed in Section 7.2.1 that the IC. statistic inan unweighted finite-horizon return repression 
Converges to zero as the horizon Increases, Hun laut and Hall (1989), Scott (985), and Shiller 


(1989, Chapter 10) have iun regressions of this sort. 
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The equality in (7.2.12) holds because under the null hypothesis (7.2.11) 
Pî — pi must be uncorrelated with р, so no covariance term appears in the 
variance of pf; the variance inequality follows directly. Equation (7.2.12) 
can also be understood by noting that an optimal forecast cannot be more 
variable than the quantity it is forecasting. With constant expected returns 
the stock price forecasts only the present value of future dividends, so it 
cannot be more variable than the realized present value of future dividends. 
Tests of this and related propositions are known as variance-bounds tests. 

As Durlauf and Phillips (1988) point out, variance-bounds tests can 
be restated as orthogonality tests. To see this, consider a regression of f, 
on pf — fy. This is the reverse of the regression considered above, but it 
too should have a zero coefficient under the null hypothesis. The reverse 
regression coefficient is always Ө = Cov[p? — p, p,]/ Varl r - р]. It is 
straightforward to show that 


Varl /] - Varl ,] 
Var[pr — pi] 


so the variance inequality (7.2.12) will be satisfied whenever the reverse 
regression coefficient @ > —1/2. This is a weaker restriction than the or- 
thogonality condition @ = 0, so the orthogonality test clearly has power 
in some situations where the variance-bounds test has none. The justifi- 
cation for using a variance-bounds test is not increased power; rather it is 
that a variance-bounds test helps one to describe the way in which the null 
hypothesis fails. 


= 14 20, (7.2.13) 


Unit Roots 
Our analysis so far has assumed that the population variances of log prices 
and dividends exist. This will not be the case if log dividends follow a unit- 
root process; then, as Kleidon (1986) points out, the sample variances of 
prices and dividends can be very misleading. Marsh and Merton (1986) 
provide a particuldrly neat example. Suppose that expected stock returns 
are constant, so the null hypothesis is true. Suppose also that a firm's man- 
agers use its stock price as an indicator of "permanent earnings," setting the 
ſirm's dividend equal to a constant fraction of its stock price last period. In 
log form, we have 

di = b+ py (7.2.14) 
where there is a unique constant that satisfies the null hypothesis (7.2.11). 
It can be shown that both log dividends and log prices follow unit-root 
processes in this example. Substituting (7.2.14) into (7.2.8), we find that 
the perfect-foresight stock price is related to the actual stock price by 


№ = (1-0) L sp (7.9.15) 


re 
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This is just a smoothed version of the actual stock price f, so its variance 
depends оп the variance and autocorrelations of fj. Since autocorrelations 
' can never be greater than onc, f; must have a lower variance than fy. The 
importance of this result is not that it applies to population variances (which 
are not well defined in this example because both log prices and log div- 
idends have unit roots), but that it applics to sample variances in every 
sample. Thus the variance inequality (7.2.12) will always be violated in the 
arsh-Merton example. 
This unit-root problem is important, but it is also casy to circumvent. 
he variable p? — p, is always stationary provided that stock returns are 
stationary, so any test that f; — f, is orthogonal to stationary variables will be 
well-behaved. The problems pointed out by Kleidon (1986) and Marsh and 
Merton (1986) arise when p? — fy is regressed on the stock price fy, which has 
a unit root. These problems cau be avoided by using unit-root regression 
theory or by choosing a stationary regressor, such as the log dividend-prize 
ratio, Some other ways to deal with the unit-root problem are explored in 


Problem 7.5.1 


Finite-Sample Considerations 

So far we have treated the perfect-foresight stock price as if it were an ob- 
servable variable. But as defined in (7.2.8), the perfect-foresight price is 
unobservable in a finite sample because it is a discounted sum of dividends 
out to the infinite future. The definition of p? implies that 


T-1-1 


B= 01-0) L PI h k- not p. (72.16) 
J=0 


Given data up through time T the first term on the right-hand side of 
(7.2.16) is observable but the second term is not. 

Following Shiller (1981), one standard response to this difficulty is to 
replace the unobservable pj by an observable proxy f; 7 that uses only in- 
sample information: 


T-1-1 


Мт = (1—0) Se PU (dita, Fk т) + pp. (7.2.17) 
720 


Here the terminal value of thc actual stock price, pr, is used in place of the 
terminal value of the perfect-foresight stock price, py. Several points are 


Durlauf and Hall (1989) apply unit-root regression theory, while Campbell and Shiller 
(1988a,b) replace the log stock price with the log dividend-price ratio. Problem 7.5 is based 
on the work of Mankiw, Romer, and Shapiro (1985) and West (1988b). 

Shiller (1981) used the sample average price instead of the end-of-sample price in his 
terminal condition, but later work, including Shiller (1989), follows the approach discussed 
her J. 
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worth noting about the variable p? у. First, if expected returns are constant, 
(7.2.11) continues to hold when Piris substituted for pt. Thus tests of the 
Constantexpected-return model can use fr. Second, a rational bubble in 
the stock price will affect both p and Pir Thus tests using pf, include bub- 
bles in the null rather than the alternative hypothesis. Third, the difference 
B ＋ — bi can be written as a discounted sum of demeaned stock returns, 
with the sum terminating at the end of the sample period T rather than at 
some fixed horizon from the present date l. Thus orthogonality tests using 
Pir fr are just long-horizon return regressions, where future returns are 
geometrically discounted and the horizon is the end of the sample period. 

As one might expect, the asymptotic theory for statistical inference in 
orthogonality and variance-bounds tests is essentially the same as the theory 
used to conduct statistical inference in long-horizon return regressions," 
As always, in finite samples it is important to look at the effective order of 
overlap (that is, the number of periods in (7.2.9) during which discounted 
future returns make a nonnegligible contribution to today's value of pr 4. — 
p). Uf this is large relative to the sample size, then asymptotic theory is 
unlikely to be a reliable guide for statistical inference, 

Flavin (1983) gives a particularly clear intuition for why this might be 
a problem in the context of variance-bounds tests. She points out that 
whenever a sample variance around a sample mean is used to estimate a 
population variance, there is some downward bias caused by the fact that 
the true mean ofthe process is unknown. When the process is white noise, it 
is well-known that this bias can be corrected by dividing the sum of squares 
by 7 — 1 instead of Т. Unfortunately, the downward bias is more severe 
for serially correlated processes (intuitively, there is a smaller number of 
effective observations for these processes), so this correction becomes inad- 
equate in the presence of serial correlation. Now Pf is тоге highly scrially 
correlated than Po since pr changes only as dividends drop out of the 
presencvalue formula and discount factors are updated, while fr is affected 
by new information about dividends. Thus the ratio of the sample variance 
of Py io the sample variance of piis downward-biased, and this can cause 
the variance inequality in (7.2.12) to be violated too often in finite samples. 
From ihe equivalence of variance-bounds and orthogonality tests, the same 
problem arises in a regression context. 


7.2.3 Vector A uloregressive Methods 


The methods discussed in the previous two sections have the common fea- 
ture that they try to look directly at long-horizon properties of the data. This 


PLeRoy and Steigerwald (1992) use Monte Carlo methods to sudy the power of orthogo- 
tality and variancebounds tests, 
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can dead to statistical difficulties in finite samples. An alternative approach 
is to assume that the dynamics of the data are well described by a simple 
ümesseries model; longhiorizon properties can then be imputed from the 
short-run model rather than estimated directly, In the variance-bounds liter- 
ature, this is the approach of LeRoy and Porter (1981); These authors note 
that a variaucesbounds test does not require observations of the perfect 
foresight price pi itself; it merely requires an estimate of the variance of pr, 
which can be obtained from a univariate time-series model for dividends, 
West (19881) develops a variant of this procedure. 

To see how this approach can work, suppose that one observes the 
complete vector of state variables x, used by market participants, and that 
x, follows it vector autoregressive (VAR) process. Any VAR model can be 
written in firs-order form by augmenting the state vector with suitable lags 
of the original variables, so without loss of generality we write: 


ха = Ax, b єр. (7.2.18) 


Here A is a matris of VAR cocllicients, and €; is a vector of shocks to the 
VAR, We have dropped constants for simplicity; one can think of the state 
vector as including demeaned variables, 

Equation (7.2.18) implies that multiperiod forecasts of the state vector 
x, cin be formed by matrix multiplication: 


хату = A!*!y,, (7.2.19) 


This makes it easy to calculate the long-horizon forecasts that determine 
prices in (7.1.22) and (7.1.23), or the revisions of long-horizon forecasts 
that determine returns in (7.1.25) and (7.1.26). 


Vector Autoregressions and Price Volatility 

As a first example, suppose that the state vector includes the stock price 
f aS Us first element and the dividend d; as its second element, while the 
remaining Clements are other relevant forecasting variables. We define 
vectors el = [100 ... 0L and e? = [010 ... 0]. These vectors pick 
out tie fist element (p) aud the second element (di) from the state vector 
xy. Using these definitions and equations (7.1.22), (7.1.23), and (7.2.19). 
the dividend component of tlie stock price is 


pa = УТАТ, = (1 De A – лА) !х. (7.2.20 


1 


The stock price itsell is fy -> elx, so the expected-return component ol 
the stack price is the difference between the two. IF expected returns are 
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constant, then , = pan which imposes the restriction 
el’ = (1 - р)е2'А(1— pA)! (7.2.21) 


on the VAR system. This can be tested using a nonlinear Wald test. le 

So far we have assumed that the vector x, includes all the relevant vari- 
ables observed by market participants. Fortunately this very strong assump- 
tion can be relaxed. Even if x, includes only a subset of the relevant in- 
formation, under the constant-expected-return null hypothesis the stock 
price p, should still equal the best VAR forecast of the discounted value 
of future dividends as given on the right-hand side of (7.2.21). Intuitively, 
when the null hypothesis is true the stock price perfectly reveals investors' 
information about the discounted value of future dividends. Another way 
to see the same point is to interpret the restriction (7.2.21) as enforcing 
the unforecastability of multiperiod stock returns. If multiperiod returns 
are unforecastable given investors' information, they will also be unfore- 
castable given any smaller set of information variables, and thus the VAR 
test of (7.2.21) is a valid test of the null hypothesis. 

One can also show that (7.2.21) is a nonlinear transformation of the 
restrictions implied by the unforecastability of single-period stock returns. 
In the VAR system the single-period stock return is unforecastable if and 
only if : 

el'(I— pA) = (1 — p)e2’A, (7.9.99) 


which is obtained from (7.2.21) by postmultiplying each side by (I- pA). The 
economic meaning of this is that multiperiod returns are unforecastable 
if and only if one-period returns are unforecastable. However Wald test 
statistics are sensitive to nonlinear transformations of hypotheses; thus Wald 
tests of the VAR coefficient restrictions may behave differently when the 
restrictions are stated in the infinite-horizon form (7.2.21) than when they 
are stated in the single-period form (7.2.22). An interesting question for 
future research is how alternative VAR test statistics behave in simple models 
of time-varying expected returns such as the AR(1) example developed in 
this chapter. 

An important caveat is that when the constant-expected-return null hy- 
pothesis is false, the VAR estimate of pa will in general depend on the 
information included in the VAR. Thus one should be cautious in interpret- 
ing VAR estimates that reject the null hypothesis. As an example, consider 
an updated version of the VAR systein used by Campbell and Shiller (1988a) 
which includes the log dividend-price ratio and the real log dividend growth 
rate. These variables are used in place of the real log price and log dividend 


For details see Campbell and Shiller (1987, 1988a,b). Campbell and Shiller also show 
how to test other models of expected stock returns in the VAR framework. 
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Figure 7.1. Log Real Stock Price and Dividend Series, Annual US Data, 1872 10 1994 


in order to ensure that the VAR systein is stationary. The system is estimated 
with four lags using annual data from 1871 to 1994. Figure 7.1 shows the 
real log stock price as a solid linc and the rcal log dividend as a dashed linc. 
Both series have been demeaned so that the sample mean in the Figure is 
zero. The two lines tend to move together, but the movements of the log 
stock price are larger than those of the log dividend; thus the price-dividend 
ratio is procyclical and the dividend-price ratio is countercyclical. Figure 7.2 
again shows the demeaned real log stock price as a solid line, but now the 
dashed line is the demeaned VAR estimate of ра, the present value of fu- 
ture dividends. Because the real log dividend is close to a random walk, the 

VAR estimate of pa is close to d, itself and thus the variation in p is larger 
than that in par Figure 7.3 shows the log dividend-price ratio as a solid line 
a ıd the log ratio of dividend to pu as a dashed linc. U The presentvalue 
model with constant discount rates explains very little of the vatiation in 
stock prices relative to dividends. 

While this general conclusion is robust, the smoothness of fij is sensitive 
tò the specification of the VAR system. In a low-order system with four lags 


| 
Iv 


i) These series have not been demeaned; the level of the dividend-price ratio ean be recov- 
ened from the figure by exponenuating the plotted solid line. 
1 
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Figure 7.2. Log Real Stack Price and Estimated Dividend Component, Annual US Data, 
1876 ta 1994 


or fewer, pu moves closely with di. As one increases the lag length towards 
ten lags, par becomes much smoother and more like a trend linc. The same 
thing happens if one adds to the VAR system a ratio of price to a 10-year or 
30-year moving average of earnings, as suggested by Campbell and Shiller 
(1988b). There appears to be some long-run mean-reversion in dividend 
growth which is captured by these expanded VAR systems. 

In conclusion, the VAR approach strongly suggests that the stock market 
is too volatile to be consistent with the view that stock prices are optimal fore- 
casts of future dividends discounted at a constant rate. Some VAR systems 
suggest that the optimal dividend forecast is close to the current dividend, 
others that the optimal dividend forecast is even smoother than the cur- 
rent dividend; neither type of system can account for the tendency of stock 
prices to move more than one-for-one with dividends. Suictly speaking, 
however, once the null hypothesis p = par is rejected, any interpretation 


Barsky and De Long (1993) point out that stock price behavior could be rationalized 
if there were a unit400t component in dividend growth, but they do not present any direct 
econometric evidence for such a component. Donaldson and Kunst (1996) argue that 
a nonlinear dividend forecasting model delivers more volatile forecasts of long-run future 
dividends, This remains an active research area. 
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Figure 7.3. Log Dividend-Price Ratio and Estimated Dividend Component, Annual US 
data, 1876 ta 1991 


of the behavior of pa is conditional on the information variables included 
in the VAR. 


Vector Auloregressions and Return Volatility 

A common criticism of volatility tests, which applies equally to VAR systems 
including prices and dividends, is that the time-series process driving divi- 
dends may not be stable through time." Fortunately it is possible to analyze 
the variability of stock returns without modeling the dividend process. Con- 
sider a state vector x, whose first element is the one-period stock return and 
whose other elements are relevant forecasting variables for returns. With 
this system the unexpected stock return becomes 


nac Баа = elem. (7.2.93) 


Phe Modigliani Miller theorem savs that Firavalue maximization by managers does nol 
constrain the form ol dividend policy, Lehmann (1991) uses this to argue that the stochastic 
process describing dividends is unlikely to be stable, 
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Recall that the revision in expectations of all future stock returns, 17,141, is 
defined as 


oo oo 
Math = Emi p 2 -E [энн]. 


j=! pel 


This becomes 
oo 8 by 
man = el Ae = eU рА) eu. (7.2.24) 
ј=\ 


From (7.1.26) the revision in expectations of future dividends, па 1+1, can 
be treated as a residual: 


Haad = (ты m Е) + h = el’ (1+ pA — pA) E 


As a concrete example, consider an updated version of the system estimated 
by Campbell (1991) in which the state vector has three elements; the real 
stock return (n), the log dividend-price ratio (xx), and the level of the 
stochastically detrended short-term interest rate (хз). Using monthly data 
over the period 1952:1 to 1994:12, the estimated first-order VAR for these | 
variables, with asymptotic standard errors in parentheses, is 


0.055 0.655 —0.520 
(0.053) (0.230) (0.166) 


—0.038 0.999 —0.000 


at 
+ 
A 


(0.001) (0.003) (0.002) a 
"Nd —0.032 —0.040 0.707 а | 
(0.011) (0.046) (0.050) | 
61.71 | 
＋ € |. (7.2.25) _ 
€3,1 | 


The matrix in (7.2.25) is a numerical example of the VAR coefficient matrix 
A. The R? statistics for the three regressions summarized in (7.2.25) are 


0.040, 0.996, and 0.537, respectively, indicating a modest degree of fore- 
castability for monthly stock returns. 


“As the coefficient estimates suggest, the log dividend-price ratio has a root extremely close 
to unity in this sample period. Although there are theoretical reasons for believing that the log 
dividend-price ratio is stationary, and unit-root tests reject a unit root over long sample periods 
in US data, the persistence of the log dividend. price ratio does lead to inference problems in 
the VAR system, Hodrick (1992) is а careful Monte Carlo study of these problems. 
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We can substitute the estimated A matrix into the formula (7.2.24) aud 
use the estimated vartance-covariance matrix of the error vector € to 
calculate the sample variances and covariance of the expected-return and 
dividend components of stock returns, The estimated expected-return com- 
ponent has a sample variance equal to 0.75 times the variance of realized 
returns, while the estimated dividend component has a sample variance 
only 0.12 times the variance of realized returns. The remaining variance of 
realized returns (0.13 of the total) is attributed to covariance between the 
expected-return and dividend components, 

The reason for this result is that the log dividend-price ratio forecasts 
stock returns, and it is itself a highly persistent process. Thus revisions in the 
log dividend-price ratio are associated with persistent changes in expected 
future returns, and this can justify large changes in stock prices. The csti- 
mated VAR process is somewhat more complicated than the simple AR(I) 
example developed earlier in this chapter, for it includes two forecasting 
variables, each of which is close to a univariate AR(1). However the main 
effect of the interest-rate variable is to increase the forecastability of one- 
period stock returns; it has a rather modest effect on the long-run behavior 
of the system, which is dominated by the persistent movements of the log 
dividend-price ratio. Thus the long-run properties of the VAR system are 
similar to those of the AR(U example. From this and our previous analysis, 
one would expect that VAR systems like (7.2.25) could account for the pat- 
tern of long-horizon regression results, and this indecd seems to be the case 
as shown by Campbell (1991), Hodrick (1992), and Kandel and Stambaugh 
(1989). Of course, VAR systems impose more structure on the data; but 
Hodrick (1992) presents some Monte Carlo evidence that when the struc- 
Wire is correct, the finite-sample behavior of VAR systems is correspondingly 

etter than that of long-horizon regressions with a large horizon relative to 
the sample size. 


7.3 Conclusion 


he rescarch described in this chapter has helped to transform the way fi- 


nancial economists view asset markets. It used to be thought that expected 
t 


e returns were approximately constant and that movements in prices 


cpuld be attributed to news about future cash payments to investors. Today 


ile importance of timc-variation in expected returns is widely recognized, 


| “Over a longer period 1926 to 1988, the two variances and the covariance have roughly 
equal shares of the overall variance of realized stock returns, Asymptotic standard errors for the 
variance decomposition can be calculated using the delta method explained in Section A.4 of 
the Appendix. As in the price«lividend VAR discussed above, the decomposition is conditionad 
on the information variables included in the VAR system. 


Problems . ; 287 


and this has broad implications for both academics and investment profes- 
sionals, 

Au the academic level, there is an explosion of research on the determi- 
nants of time-varying expected returns. Economists are exploring a great 
variety of ideas, from macroeconomic models of real business cycles to more 
heterodox models of investor psychology. We discuss some of these ideas 
in Chapter 8. Ata more practical level, dynamic assctallocation models 
are becoming increasingly popular, The techniques discussed here can 
provide quantitative inputs for these investment strategies. 1n this context 
long-horizon return regressions may be attractive not only for their poten- 
tial statistical advantages, but also because investment strategies based on 
long-horizon return forecasts are likely to incur lower transactions costs. 


Problems—Chapter 7 


7.1 In the late 1980s corporations began to repurchase shares on a large 
scale, In this problem you are asked to analyze the effect of repurchases on 
the relation between stock prices and dividends. 

Consider a firm with fixed cash flow per period, X. The total market 
value of the firm (including the current cash flow X) is V. This is the 
present value of current and future cash flow, discounted at a constant rate 
R: V = GTX / R. Each period, the firm uses a fraction A of its cash 
flow to repurchase shares at cum-dividend prices, and then uses a fraction 
(f — A) of its cash flow to pay dividends on the remaining shares. The firm 
has N, shares outstanding at the beginning of period 1 (before it repurchases 
shares). 


7.1.1 What are the cum-dividend price per share and dividend per share 
at ime £? 


7.1.2. Derive a relation between the dividend-price ratio, the growth rate 
of dividends per share С, and the discount rate ft. 


7.1.3 Show that the price per share equals the expected present value 
of dividends per share, discounted at rate К. Explain intuitively why this 
formula is correct, even though the firm is devoting only a portion of its 
cash flow to dividends. 


7.2. Consider a stock whose log dividend d; follows a random walk with 
drift: 


dia = HF did erst 


where er, ^ ЛО, о 2). Assume that the required log rate of return on the 
stock is a constant r. 
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7.2.1] We use the notation 4 (for “fundamental value") to denote the 
expected present value of dividends, discounted using the required rate 
of return, Show that / is a constant multiple of the dividend D. Write 
the ratio H/D as a function of the parameters of the model. 


7.2.2 Show that another formula for the stock price which gives the same 
expected rate of return is 


where А > Ова function of the other parameters of the model. Solve 
for A. 


7.2.3 Discuss the strengths and weaknesses of this model of a rational 
bubble as compared with the Blanchard-Watson bubble, (7.1.16) in the 
text, 

Note: This problem is based on Froot and Obstfeld (1991). 


1.3 Consider û stock whose expected return obeys 


Mayr} = FF Nr. (7.1.97) 


Assume that NY follows an АКС) process, 


хаз = фл + AMT Q<@¢@< 1. (7.1.28) 


7.3.1 Assume that £,,4 is uncorrelated with news about future dividend 
payments on the stock. Using the loglinear approximate framework de- 
veloped in Section 7.1.3, derive the autocovariance function of realized 
stock returns. Assume that ф < p, where p is the parameter of lip- 
earization in the loglincar framework. Show that the autocovariances of 
stack returns are all negative and die off at rate G. Give an economic 
interpretation of vour formula for return autocovariances, 


7.3.2 Now allow £44 to be correlated with news about future dividend 

payments, Show that the autocovariances of stock returns can be positive 

WE. and dividend news have a sufficiently large positive covariance. 
7.4 Suppose that the log “fundamental value" of a stock, uy, obeys the 
process 

1—0 
uw = нб | ——— (4-40) act. 
Pp 

where it is a constant, p is the parameter of linearization defined in Section 
7.1.3, d, is the log dividend on the asset, and e, is à white noise error term. 

7.4.1 Show that if the price of the stock equals its fundamental value, 

then the approximate log stock return defined in Section 7.1.3 is unfore- 

castable. 
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7.4.2 Now suppose that the managers of the company pay dividends 
according to the rule 


d, = ct Au} + (1 — A)d,-i tg 


where c and A are constants (0 < А < I), and y is a white noise error 
uncorrelated with €. Managers partially adjust dividends towards funda- 
mental value, where the speed of adjustment is given by A. Marsh and 
Merton (1986) have argued for the empirical relevance of this dividend 
policy. Show that if the price of the stock equals its fundamental value, the 
log dividend-price ratio follows an AR(1) process. What is the persistence 
of this process as a function of A and p? 


7.4.3 Now suppose that the stock price does not equal fundamental 
value, but rather satisfies p = v; = у( = v), where y > 0. That is, price 
exceeds fundamental value whenever fundamental value is high relative 
to dividends. Show that the approximate log stock return and the log 
dividend-price ratio satisfy the AR(1) model (7.1.27) and (7.1.28), where 
the optimal forecaster of the log stock return, x, is a positive multiple of 
the log dividend-price ratio. 


7.4.4 Show that in this example innovations in stock returns are nega- 
tively correlated with innovations in ху. 


7.5 Recall the definition of the perfect-foresight stock price: 


EN no 

Р! = i-o p! [a = Pda t k— г]. (7.2.8) 
The hypothesis that expected returns are constant implies that the actual, 
stock price fy is a rational expectation of p}, given investors’ information. 
Now consider forecasting dividends using a smaller information set Ji. De- 


fine jj = H | Jl]. i 


7.5.1 Show that Var(p) > Var(fy). Give some economic intuition for 
this result. | 


7.5.2 Show that Var(p? — p) > Var(fpy — р) and that Маг(ру — p) > | 
Маг» — fj). Give some economic intuition for these results. Discuss 
circumstances where these variance inequalities can be more useful than ; 

КИЙДИ Ө; 5 | 
the inequality in part 7.5.1. | 
1.5.3 Now define „ii = k+p рар) fi 7,41 isthe return that 
would prevail under the constant-expected-return model if dividends were 
forecast using the information set f. Show that Var(n4,) x Var(r44). i 
Give some economic intuition for this result and discuss circumstances 
where it can be more useful than the inequality in part 7.5.1. 


Note: This problem is based on Mankiw, Romer, and Shapiro (1985) and ' 
West (1988b). Ш 


Intertemporal Equilibrium Models 


THIS CHAPTER RELATES asset prices to the consumption and savings deci- 
sions of investors. The static asset pricing models discussed in Chapters 
5 and 6 ignore consumption decisions. They treat asset prices as being 
determined by the portfolio choices of investors who have preferences de- 
fined over wealth one period in the future. Implicitly these models assume 
that investors consume all their wealth after one period, or at lcast that 
wealth uniquely determines consumption so that preferences defined over 
consumption are equivalent to preferences defined over wealth. This sim- 
plification is ultimately unsatisfactory. In the real world investors consider 
many periods in making their portfolio decisions, and in this intertemporal 
setting one must model consumption and portfolio choices simultancously. 

Intertemporal equilibrium models of asset pricing have the potential 
to answer two questions that have been left unresolved in carlier chapters. 
First, what forces determine the riskless interest rate (or more generally the 
rate of return on a zero-beta asset) and the rewards that investors demand for 
bearing risk? In the CAPM the riskless interest rate or zero-beta return and 
the reward for bearing market risk are exogenous parameters; Ure model 
gives no account of where they come from. In the APT the single price 
of market risk is replaced by a vector of factor risk prices, but again the 
risk prices are determined outside the model. We shall sce in this chapter 
that intertemporal models can yield insights into the determinants of these 
parameters. 

Asecond, related question has to do with predictable variations in asset 
returns, The riskless real interest rate moves over time, and in Chapter 7 we 
presented evidence that forecasts of stock returns also move over time. Im- 
portanily, excess stock returns appear to be just as forecastable as real stock 
returns, suggesting that the reward for bearing stock market risk changes 
over time. Are these phenomena consistent with market efficiency? ls it 
possible to construct a model with rational, utility-maximizing investors in 
which the equilibrium return on risky assets varies over time in the way de- 
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scribed in Chapter 77 We shall use intertemporal equilibrium models to 
explore these questions, 

Section В. begins by stating the proposition that there exists a Machastic 
discount factor such that the expected product of any asset return with the 
stochastic discount factor equals one. This proposition holds very gener- 
ally in models that vule out arbitrage opportunities in financial markets. 
Equilibrium models with optimizing investors imply tight links between the 
stochastic discount factor and the marginal utilities of investors’ consump- 
tion. Thus by studying the stochastic discount factor one can relate asset 
prices to the underlving preferences of investors. 

In Section 8.1 we show how the behavior of asset prices can be used 
to reach conclusions about the behavior of the stochastic discount factor. 
In particular we describe Hansen and Jagannathan's (1991) procedure for 
calculating a lower bound on the volatility of the stochastic discount factor, 
given any set ol asset returns, Using long-run annual data on US short-term 
interest rates and stock returns over the period 1889 to 1994, we estimate 
the standard deviation of the stochastic discount factor to be 30% per vear 
or more. 

Constunption-based asset pricing models aggregate investors into a sin- 
gle representative agent, who is assumed to derive utility from the aggre- 
gate consumption of the economy. In these models the stochastic discount 
factor is the intertemporal marginal rate of substitution—the discounted ratio 
ol marginal utilities in two successive periods—for the representative agent. 
The Euler equations—the first-order conditions for optimal consumption and 
portfolio choices of the representative agent—can be used to link asset ro- 
turns and consumption, 

Section 8.2 discusses à commonly used consumption-based model in 
which the representative agent has timceseparable power utility. In this 
model a single parameter governs both rik aversion and the elasticity of in- 
tertemporal substitution—ihe willingness ot the representative agent to adjust 
planned consumption growth in response to investment opportunities. In 
fact, the elasticity of intertemporal substitution is the reciprocal of risk aver- 
sion, so in this model risk-averse investors must also be unwilling to adjust 
their consumption growth rates to changes in interest rates. The model 
explains the risk premia on assets by their covariances with aggregate con- 
simption growth, multiplied by the risk-aversion coefficient for the repre- 
sentative investor. 

Using long-run annual US data, we emphasize four stylized facts. First, 
the average excess return on US stocks over short-term debt—the equity 
ſurmium—i about 6% per year. Second, aggregate consumption is very 
smooth, so covariances with consumption growth ave small. Putting these 
facts together the power utility model can only fit the equity premium ifthe 
coefficient of relative risk aversion is very large. This is the equity premium 
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puzzle of Mehra and Prescott (1985). Third, there are some predictable ` 
movements in short-term real interest rates, but there is little evidence of 
accompanying predictable movements in consumption growth. This sug- 
gests that the elasticity of intertemporal substitution is small, which in the 
power utility model again implies a large coefficient of relative risk aver- 
sion. Finally, there are predictable variations in excess returns on stocks over 
short-term debt which do not seem to be related to changing covariances of 
stock returns with consumption growth. These lead formal statistical tests 
to reject the power-utility model. 

In Sections 8.3 and 8.4 we explore some ways in which the basic model 
can be modified to fit these facts. In Section 8.3 we discuss the effects 
of market frictions such as transactions costs, limits on investors’ ability to 
borrow or sell assets short, exogenous variation in the asset demands of some 
investors, and income risks that investors are unable to insure. We argue that 
many plausible frictions make aggregate consumption an inadequate proxy 
for the consumption of stock market investors, and we discuss ways to get 
testable restrictions on asset prices even when consumption is not measured. 
We also discuss a generalization of power utility that breaks the tight link 
between risk aversion and the elasticity of intertemporal substitution. 

In Section 8.4 we explore the possibility that investors have more com- 
plicated preferences than generalized power utility. For example, the utility 
function of the representative agent may be nonseparable between con- 
sumption and some other good such as leisure. We emphasize models in 
which utility is nonseparable over time because investors derive utility from 
the level of consumption relative to a time-varying habit or subsistence level. 
Finally, we consider some unorthodox models that draw inspiration from 
experimental and psychological research. 


8.1 The Stochastic Discount Factor 


We begin our analysis of the stochastic discount factor in the simplest pos- 
sible way, by considering the intertemporal choice problem of an investor 
who can trade freely in asset i and who maximizes the expectation of a 
time-separable utility function: 


оо 
Мах Е, | 3 9 U(G,) |, (8.1.1) 
Ј=0 


where 6 is the time discount factor, Cj, j is the investor's consumption in 
period {+ j, and U(Cı+j) is the period utility of consumption at ! + 3: 
One of the first-order conditions or Euler equations describing the investor's 
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optimal consumption and portfolio plan is 
U'(G) = 8E, HHH А) U’ CC). (8.1.2) 


The left-hand side of (8.1.2) is the marginal utility cost of consuming one 
real dollar less at time ¢; the right-hand sidc is the expected marginal utility 
benefit from investing the dollar in asset i at time ¢, selling it at time (4-1 
for (1 + Л; т) dollars, and consuming the proceeds. The investor equates 
marginal cost and marginal benefit, so (8.1.2) describes the optimum. 
| If we divide both the left- and righthand sides of (8.1.2) by U'(C)), we 
| get 
| ] = Е[(1+ Rr OMe]. (8.1.3) 
here Mey = 80(С.1)/ U'(C,). The variable M,, in (8.1.3) is known as the 
' stochastic discount factor, or pricing kernel. In the present modcl it is equivalent 
ito the discounted ratio of marginal utilities 8 U'(C,41)/ ОСС), which is called 
\ the intertemporal marginal rate of substitution. Note that the intertemporal 
marginal rate of substitution, and hence the stochastic discount factor, arc 
always positive since marginal utilities are positive. 

Expectations in (8.1.3) are taken conditional on information available 
at time t; however, by taking unconditional expectations of the left- and 
right-hand sides of (8.1.3) and lagging one period to simplify notation, we 
obtain an unconditional version: 


1 = E[(1 + ROMI. (8.1.4) 


These relationships can be rearranged so that they explicitly determine 
lexpected asset returns. Working with the unconditional form for conve- 
'nience, we have ЕІ + Ri) Mj] = E[1 + NEIN.] + Cov[ Ry, M,], so 


1 
ЕП +R] = EM ( — Cov[ Ra, M]). (8.1.5) 
If there is an asset whose unconditional covariance with the stochastic dis- 
count factor is zero—an “unconditional zcro-beta” asset—then (8.1.5) im- 
plies that this assct's expected gross return Ef] + Ro] = / EIL M, J. This can 
be substituted into (8.1.5) to obtain an expression for the excess return Zi 
on asset i over the zero-bela return: 


ЕМ] = ER, - Ro] = ELL Ru) Covi Ry, M]. (8.1.6). 


This shows that an asset's expected return is greater, the smaller its covari- 
ance with the stochastic discount factor. The intuition behind this result is 
that an asset whose covariance with М, is small tends to have low returns 
when the investor's marginal utility of consumption is high—that is, when 
consuinption itself is low, Such an asset is risky in that it fails to deliver 
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wealth precisely when wealth is most valuable to the investor. The investor 
therefore demands a large risk premium to hold it. 

Although it is casiest to understand (8.1.3) by reference to the intertem- 
poral choice problem of an investor, thc equation can be derived merely 
from the absence of arbitrage, without assuming that investors maximize 
well-behaved utility functions.! We show this in a discrete-state setting with 
states э = 1... S and assets i = 1... V. Define qı as the price of asset i and 
qas the (Nx 1) vector of asset prices, and define X., as the payoff of asset i 
in state sand X as an (Sx N) matrix giving the payolls of cach asset in each 
State. Provided that all asset prices are nonzero, we can further define G as 
an (Sx N) matrix giving the gross return on cach asset in cach state. That 
is, the typical clement of G is C, = J H, Xalq- 

Now define an (Sx1) vector p. with typical clement p» to be a state 
price vector if it satisfies X'p = q. An asset can be thought of as a bundle 
of state-contingent payoffs, one for cach state; the sth clement of the state 
price vector, Pu gives the price of one dollar to be paid in state s, and we 
represent cach asset price as the sum of its Statc-contingent payoffs times the 
appropriate state prices: qi =X, Xu. Equivalently, if we divide through by 
qu we get! = N, + Ru) or G'p = 1, where Lis an (Sx I) vector of ones. 

An important result is that there exists a positive state price vector if 
and only if there are no arbitrage opportunities (that is, no available assets 
or combinations of assets with nonpositive cost today, nonnegative payoffs 
tomorrow, and a strictly positive payoff in at least one state). Furthermore, 
if there exists a positive state price vector, then (8.1.3) is satisfied for some 
positive random variable M. To sce this, define M, = p/n. where 7, is the 
probability of state s, For any asset i the relationship G'p = а implics 


М 


BY 
b= DUA +R) = POr, MAER) = ES RM), (8.1.7) 


ui =! 


which is the static discrete-state equivalent of (8.1.3). M, is the ratio of the 
state price of state s to the probability of state s; hence it is positive because 
State prices and probabilities are both positive, 

If M, is small, then state s is “cheap” in the sense that investors are 
unwilling to pay a high price to receive wealth in state s. An asset that tends 
to deliver wealth in cheap states has a return that covaries negatively with 
M. Such an asset is itself cheap and has a high return on average. This is 
the intuition for (8.1.6) within a discrete-state framework. 

In the discrete-state model, asset markets are complete if for cach state 
5 опе can combine available assets to Bet a nonzero рауо in s and zero 


"phe theory underlying equation (8.1.3) is disc Used at length in textbooks suchas Ingersoll 


(1987). The role of conditioning information has been explored by Hansen and Richard 
(1987). 
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payoffs in all other states. A further important result is that the state price 
vector is unique if and only if asset markets are complete. In this case M 
is unique, but with incomplete markets there may exist many M's satisfying 
| equation (8.1.3). This result can be understood by considering an economy 
with several utility-maximizing investors, "The first-order condition (8.1.2) 
holds for each investor, so cach investor's marginal utilities can be used 
to construct a stochastic discount factor that prices the assets in the econ- 
omy. With complete markets, the investors! marginal utilities are perfectly 
correlated so they all меа the same, unique stochastic discount factor; with 
incomplete markets there may be idiosyneratic variation in marginal utilities 
and hence multiple stochastic discount factors that satisfy (8.1.3), 


K. J. / Volatility Bounds 


Any model o'expected asset returns may be viewed asa model of the stochas- 
tic discount factor, Before we discuss methods of testing particular models, 
we ask more generally what asset return data may be able to tell us about the 
behavior of the stochastic discount factor, Hansen and Jagannathan (1991) 
have developed a lower bound on the volatility of stochastic discount factors 
that could be consistent with a given set of asset return data. They begin 
with the unconditional equation (8.1.4) and rewrite it in vector form as 

t= E[G RH) A,. (8.1.8) 
where Lis an N-vector of ones and R, is the N-vector of time-4 asset returns, 
with typical element ls. 

Hansen and Jagannathan assume that R, has a nonsingular variance- 
covariance matrix £2, in other words, that no asset or combination of assets 
is unconditionally riskless. There may still exist an unconditional zero-beta 
asset with gross mean return equal to the reciprocal of the unconditional 
mean of the stochastic discount factor, but Hansen and Jagannathan assume 
that if there is such an asset, its identity is not known a priori. Hence they 
treat the unconditional mean of the stochastic discount factor as an un- 
known parameter M. For each possible M, Hansen and Jagannathan form 
a candidate stochastic discount factor M? (AD) as a linear combination of as- 
set returns, They show that the variance of М; (M) places a lower bound on 
the variance of any stochastic discount factor that has mean M and satisfies 
(8. 1.8). 

Hansen and Jagannathan first show how asset pricing theory determines 
the coefficients Byin 

MAD A (R, - EIR, fl. (8.1.9) 


IF MCAT) is to be а stochastic discount factor it must satisfy (8.1.8), 


1 = FIO ROM AD], 


8.1. The Stochastic Discount Factor 
Expanding the expectation of the product E[( + R.) Me (M)], we have 
t = МЕ. +8] +CovR,, Me (A)] 

= МЕ +R] ER. — E[R,])(M?(M) — M)] 

= МЕ[ + Rr]  E((R, – E[R,])(R, — EIN. J) B75 

= MEL +R] + 085, (8.1.10) 


where Q is the unconditional variance-covariance matrix of asset returns. It 
follows then that 


By = 9710 – МЕ[ +R,)), (8.1.11) 

and the variance of the implied stochastic discount factor is 
уи] = вп, | 
- (6 — MEU + R.'Q7 t — MEL R.). (8.1.12) 

The right-hand side of (8.1.12) is a lower bound on the volatility of any, 


stochastic discount factor with mean M. To see this, note that any other 
M,(M) satisfying (8.1.8) must have the property 


E[t +R) (M.(M) – м;(М))] = Cov[R,, M(M) CI = 0. (8.1.13) 


Since M} (М) is just a linear combination of asset returns, it follows that 
Cov[ Mr (M), M,(M) — Mz (М)] = 0. Thus 


Var]. = Марм] + Var[M NN 
+ Cov[ M; (M), M.(M) — M; ) 


Ш 


Мам; (M)] + Var[M,(M) — Mz (М)] 
Маг[ М; (N)]. (8.1.14) 


iv 


In fact, we can go beyond this inequality to show that 


Varl М; (A)] 


Var[ M,(M)] = = Ji 
мо (Согг[М,(М), M A 


(8.1.15) 


SO a stochastic discount factor can only have a variance close to the lower 
bound if it is highly correlated with the combination of asset returns M, (M). 
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he Benchmark Portfolio 

c can restate these results in a more familiar way by introducing the idea 
XE a benchmark portfolio. We first augment the vector of risky assets with an 
artificial unconditionally riskless asset whose return is 1/M — 1. Recall that 
we have proceeded under the assumption that no unconditionally riskless 
asset exists; but if it were to exist, its return would have to be I/M — IJ. We 
hen define the benchmark portfolio return as 


— 


M; (M) 


R, (M) mm eae ee 
Е ELM; (M?] 


(8.1.16) 


It is straightforward to check that this return can be obtained by forming a 

] ortfolio of the risky assets and the artificial riskless asset, and that it satisfies 

the condition (8.1.8) on returns. Problem 8.1 is to prove that Ry, has the 
Mowing properties: 


| (P1) Ry is mean-variance efficient. That is, no other portfolio has 
smaller variance and the same mean. 
(P2) Any stochastic discount factor M,(M) has a greater correlation 
with 2%, chau with апу other portfolio. For this reasou Ry is sometimes 
referred to as a maximum-correlation portfolio (see Breeden, Gibbous, and 
Litzenberger (1989]). 


(P3) All asset returns obey a beta-pricing relation with the benchmark 
portfolio. That is, 


1 I 
à Ry — = = th 8 bi} — = — . 1. 
2l (s JJ [m ( [Rud (a )) (8.1.17) 


where By, = Covi Ri, ДЈ Маг Rul. When an unconditional zero-beta 
asset exists, then it can be substituted into (8.1.17) to get a conventional 
beta-pricing equation. 

Two further properties are useful for a geometric interpretation of the 
Hansen-Jagannathan bounds. Consider Figure 8.1. Panel (a) is the famil- 
iar mean-standard deviation diagram for asset returns, with the mean gross 
return plotted on the vertical axis and the standard deviation of return 
on the horizontal. Pancet! (b) is a similar diagram for stochastic discount 
factors, with the axes rotated; standard deviation is now on the vertical axis 
aud mcan on the horizontal. This convention is natural because in panel 
(a) we think of assets’ second moments determining their mean returns, 
while in panel (b) we vary the mean stochastic discount factor exogenously 
and trace out the consequences for the standard deviation of the stochastic 
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Figure &. J. (a) Mean-Standard Deviation Diagram for Asset xn (b) Implied Standanl 
Deviation-Mean Diagram for Stochastic Discount Factors 


discount factor, In panel (a), the feasible set of risky asset returns is shown. 
We augment this with a riskless gross return 1/M on the vertical axis; the 
minimum-varianece set is then the tangent line from H/M to the feasible 
set of risky assets, and the reflection of the tangent in the vertical axis. 
Property (P1) means that the benchmark portfolio return is in the minimun- 
variance set. It plotson the lower branch because its positive correlation with 
the stochastic discount factor gives it lower mean gross return than I/M. 
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We can now state (wo more properties: 


(РУ) The ratio of standard deviation to gross mean for the benchmark 
portfolio satisfies 


Ra 1M ELT К 
T (8.1.18) 
ELL- % ol Rul 


But the right-hand side of this equation is just the slope of the tangent 
line in panel (a) of Figure 8.1. As explained in Chapter 5, this slope is 
the maximum Sharpe ratio for the set оС assets. 


(P5) ‘Phe ratio of standard deviation to mean for the benchmark port- 
folio gross return is a lower bound on the same ratio for the stochastic 
discount factor. ‘Phat is, 


gt Rul | MOD) 


————. (8.1.19) 
ELL + Rul ° E[MIOM)] 

Properties (P) and (P5) establish that the stochastic discount factor 
in panel (b) must fie above the point where a ray from the origin, with 
slope equal to the maximum Sharpe ratio in panel (a), passes through the 
vertical fine at AL. As we vary AM, we trace out a feasible region for the 
stochastic discount factor in panel (b). This region is higher, and thus 
more restrictive, when the maximum Sharpe vatio in panel (a) is large for 
a variety of mean stochastic discount factors M. A set of asset return data 
with a high maximum Sharpe ratio over a wide range of M is challenging 
for asset pricing theory in the sense that it requires the stochastic discount 
factor to be highly variable, By looking at panel (a) one can see that such a 
data set will contain portfolios with very different mean returns and similar, 
small standard deviations. A leading example is the set of returns on US 
Treasury bills, which have differences in mean returns that are absolutely 
small, but large relative to the standard deviations of bill returns, 

The above analysis applies to returns themselves; the calculations are 
somewhat simpler if excess returns are used. Writing the excess return on 
asset over some reference asset k (not necessarily riskless) as , = Л, Hy, 
and the vector of excess returns as Z,, the basic condition (8.1.4) becomes 


0 = Z, Al,]. (8.1.20) 


Proceeding as before, we form MHAD = M (Z. — 612,10 Bip where the 
tilde is used to indicate that B is defined with excess returns. We find that 
Ву = О (= MEI[7, D, where Q is the variance-covariance matrix of excess 
returns, It follows that the lower hound on the variance of the stochastic 
discount factor is now 


Ма МДАР = АЕ, ГӨ EIZ]. (8.1.91) 
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If we have only a single excess return , then this condition simpli 
to Маг Mr (M)]. = M (E[Z,])2/ Var(Z;], or 


MM! _ EZ] 
M c(Z] | 

This is illustrated in Figure 8.2, which has the same structure as Figure 8.1, 
Now the restriction on the stochastic discount factor in panel (b) is that ib 


should lie above a ray from the origin with the same slope as a ray from the 
origin through the single risky excess return in panel (a). 


(8.1.22) 


Implications of Nonnegativity 

So far we have ignored the restriction that M, must be nonnegative. Hansen 
and Jagannathan (1991) show that this can be handled fairly straightfor- 
wardly when an unconditionally riskless asset exists. In this case the mean 
of M, is known, and the problem can be restated as finding coefficients a 
that define a random variable 


МГ = (( ＋ К,а), (8.1.23) 
where X* = inax(X, 0) is the nonnegative part of X, subject to the con- 
straint 

Ellu + R)M?*] = Efu +R U R.) = t. (8.1.24) 


In the absence of the nonnegativity constraint, this yields the previous solu- 
tion for the case where there ís an unconditionally riskless asset. With the 
nonnegativity constraint, it is much harder to find a coefficient vector a 
that satisfies (8.1.24); Hansen and Jagannathan (1991) discuss strategies for 
transforming the problem to make the solution easier. Once a coefficient 
vector is found, however, it is easy to show that M?+ has minimum variance 
among all nonnegative randorn variables M, satisfying (8.1.8). To see this, 
consider any other M, and note that 


ЕМ,М | = ЕМ + R)'o)*] 
> A EIA R) A,] 
= E ( ＋ R.) AM.] = EI (AM) 2]J. (8.1.95) 
But if E[M,Mj+] > E[(Mj*)?), then E[M?] > E[(M?*)?] since. the cor- 


relation between these variables cannot be greater than one. 

The above analysis can be generalized to deal with the more realistic. 
case in which there is no unconditionally riskless asset, by augmenting the, 
return vector with a hypothetical riskless asset and varying the return on this | 
asset, This introduces some technical complications which are discussed by: 
Hansen and Jagannathan (1991). 
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Figure 8.2. (a) Mean-Standard Deviation Diagram for a Single Excess Asset Return, (b) Im- 
plied Standard Deviation-Mean Diagram for Stochastic Discount Factors 


A First Look at the Equity Premium Puzzle 

The HansenJagannathan approach can be used to understand the well- 

kilown equity premium puzzle of Mehra and Prescott (1985).? Mehra and 

Pescot argue that the average excess return on the US stock market—the 
| 


! *Cochrane and Hansen (1992) approach the equity premium puzzle from this point of 
view. Kocherlakota (1996) surveys the Large literature on the puzzle. 


8.1. The Stochastic Discount Factor | | | 303 


a(Ah 
1.6 


08 


« 
Sy AAA rw. ы mue 


0.80 0.84 0.88 0.92 0.96 1.00 1.04 1.08 1.12 
M 
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equity premium—is too high to be casily explained by standard asset pricing 
models. They make this point in the context of a tightly parametrized 
consumption-based model, but it can be made more generally using the 
excess-return restriction (8.1.22). Over the period 1889 to 1994, the annual 
excess simple return on the Standard and Poors stock index over commercial 
paper has a standard deviation of 18% and a mean of 6%. Thus the slope 
of the rays from the origin in Figure 8.2 should be 0.06/0.18 = 0.33, 
meaning that the standard deviatiou of the stochastic discount factor must 
be at least 33% if it has a mean of onc. As we shall sec in the next section, the 
standard consumption-based model with a risk-aversion coefficient in the 
conventional range implies that the stochastic discount factor has a mean 
near one, but an annual standard deviation much less than 3396. 


"rhe retur on six-month commercial paper rolled over in January and July, is used 
instead of a Treasury bill return because Treasury bill data aie not available in the early part 
of this long sample period. Мега and Prescott (1985) splice together commercial paper and 
‘Treasury bill rates, whereas here we use commercial paper vates Пи oughout the sample period 
for consistency. The choice of short-term rates makes lide difference to the results, Table 8.1 
below gives some simple moments for log asset returns, but the moments stated here are for 
simple returns. 
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Figure 8.3, whose format follows Hansen aud Jagannathan (1991), also 
illustrates the equity premium puzzle. The figure shows the feasible region 
for the stochastic discount factor implied by equation (8.1.12) and the an- 
nual data on real stock and commercial paper returns over the period 1801 
to 1994. The figure does not use the nonnegativity restriction discussed 
in the previous section. The global minimum standard deviation for the 
stochastic discount factor is about 0.33, corresponding to a mean stochastic 
discount factor of about 0.98 and an unconditional riskless return of about 
2%. As the mean moves away from 0.98, the standard-deviation bound 
rapidly increases, The difference between the feasible region in Figure 8.3 
and the region above а ray from the origin with slope 0.33 is caused by 
the fact that Figure 8.3 uses bond and stock returns separately rather than 
merely the excess return on stocks over bonds, The figure also shows mean- 
standard deviation points corresponding to various degrees of risk aversion 
and a fixed time discount rate in a consumption-based representative agent 
asset pricing model of the type discussed in the next section. The first point 
above the horizontal axis has relative risk aversion of one; successive points 
have risk aversion of two, three, and so on. The points do not enter the 
feasible region until relative risk aversion reaches a value of 25, 

In interpreting Bigire A.B and similar figures, one should keep in mind 
that both the volatility bound for the stochastic discount factor and the 
points implied by particular asset pricing models are estimated with error, 
Statistical methods are available to test whether a particular model satisfies 
the volatility bound (see for example Burnside [1994], Cecchetti, Lam, and 
Mark [1994], and Hansen, Heaton, and lautmer 11995). These methods 
use the Generalized Method of Moments of Hausen. (1989), discusscd in 
the Appendix, 


8.2 Consumption-Based Asset Pricing with Power Utility 


In Section 8.1 we showed how au equation relating asset returns to the 
stochastic discount factor, (8.1.3), could be derived from the first-order 
condition of a single investor's intertemporal consumption and portfolio 
choice problem, This equation is restated here for convenience: 


|o EO 4 А, Abad]. (8.2.1) 


lt is common in empirical research to assume that individuals can be ag- 
gregated into a single representative investor, so that aggregate consump- 
tion can be used iu place of the consumption of any particular individual, 
Equiion (8.2.1) with Аһ = SUG /f, where Gis aggregate con- 
sumption, is known as the consumption CAPM, or CCAPM. 


8.2. Consumption-Based Asset Pricing with Power Utility 


In this section we examine the empirical implications of the CCAPM. 
We begin by assuming that there is a representative agent who maximizes a 
time-separable power utility function, so that 


_ Q*'-1 


О(С,) = Я (8.2.9) 
1-у 


where y is the coefficient of relative risk aversion. As y approaches опе, 
the utility function in (8.2.2) approaches the log utility function ) = 
log(C;). 

The power utility function has several important properties. First, it 
is scale-invariant: With constant return distributions, risk premia do nat 
change over time as aggregate wealth and the scale of the economy in- 
crease. A related property is that if different investors in the economy have 
the same power utility function and can freely trade all the risks they face, 
then even if they have different wealth levels they can be aggregated into 
a single representative investor with the same utility function as the indi- 
vidual investors. This provides some justification for the use of aggregate 
consumption, rather than individual consumption, in the CCAPM. | 

^ property of power utility that may be less desirable is that it rigidly 
links two important concepts. When utility has the power form the elasticity of 
intertemporal substitution (the derivative of planned log consumption growth 
with respect to the log interest rate), which we write as ¥, is the reciprocal 
of the coefficient of relative risk aversion y. Hall (1988) has argued that thi 
linkage is inappropriate because the elasticity of intertemporal substitution 
concerns the willingness of an investor to move consumption between time 
periods—it is well-defined even if there is no uncertainty—- whereas the со- 
efficient of relative risk aversion concerns the willingness of an investor to 
move consumption between states of the world—it is well-defined even in a 
one-period model with no time dimension. In Section 8.3.2 below we discuss 
a more general utility specification, due to Epstein and Zin (1991) and Weil 
(1989), that preserves the scale-invariance of power utility but breaks the 
tight link between the coefficient of relative risk aversion and the elasticity 
of intertemporal substitution. 

"Taking the derivative of (8.2.2) with respect to consumption, we find 
that marginal utility ОС) = C, 7. Substituting into (8.2.1) we get 


a Cua 24 
1 * E, (1 + Reg) ô C " (8.2.3) 


t 


A Grossman and Shiller (1982) show that this result generalizes to a model with nontraded 


assets (uninsurable idiosyneratic risks) if consumption and asset prices follow diffusion pro- 
Cesses in a continuous-time model, 
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which was first derived by Grossman and Shiller (1981), following the closely 
related continuous-time model of Breeden (1979). А typical objective of 
empirical research is to estimate the coefficient of relative risk aversion y (or 
its reciprocal y) and to test the restrictions imposed by (8.2.3). It is casiest 
to do this if one assumes that asset returns and aggregate consumption 
are jointly homoskedastic and lognormal. Although this implies constant 
expected excess log returns, and thus cannot fit the data, it is useful for 
building intuition and understanding methods that can be applied to more 
realistic models. Accordingly we make this assumption in Section 8.2.1 
and relax it in Section 8.2.2, where we discuss the use of Hansen's (1982) 
Generalized Method of Moments (GMM) to test the power utility model 
without making distributional assumptions on returns and consumption. 


Our discussion follows closely the seminal work of Hansen and Singleton 
(1982, 1983). 


8.2.1 Power Utility in a Lognormal Model 


When a random variable X is conditionally lognormally distributed, it has 
the convenient property (mentioned in Chapter 1) that, 


logE,[X] = E,[log XJ +  Vari(log X]. (8.2.4) 


where Маг, Пор X] = E,[(log X — E, [log Xp?]. Win addition X is con- 
ditionally homoskedastic, then Маг, Пор X] = E[(log X - E,[log xz 
Var[log X ~ E,[log XI]. Thus with joint conditional lognormality and ho- 
moskedasticity of asset returns and consumption, we can take logs of (8.2.3), 
use the notational convention that lowercase letters denote logs, and obtain 


0 = Elri] + logd - yE, [Acı] + (3) [02 + yo? = 2yo,]. (8.2.5) 


Here the notation оу denotes the unconditional covariance of innovations 
Qov[xia = Enlai), угы = Еу, and o? = буу. 

Equation (8.2.5), which was first derived by Hansen and Singleton 
(1983), has both time-series and cross-sectional implications. In the time 
pries, the riskless real interest rate obcys 


y'a? 
Hf = —logó — "H + YE [Ас]. (8.2.6) 


i 
{ 
| 
Тһе riskless real rate is linear in expected consumption growth, with slope 
cbefficient equal to the coefficient of relative risk aversion, The equation 
chn be reversed to express expected consumption growth asa lincar function 
of he riskless real interest rate, with slope coefficient y = //; in fact this 


eee between expected consumption growth and the interest rate is what 


Чейз the elasticity of intertemporal substitution. 
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The assumption of homoskedasticity makes the log risk premium on any 
assct over the riskless real rate constant, so expected real returns on other 
asscts are also linear in expected consumption growth with slope coefficient 
y. We have 


Е о; А 
Е. аа = ут wy cB (8.2.7) 


The variance term on the lefrhand side of (8.2.7) is а Jensen's Inequality 
adjustment arising from the fact that we are describing expectations of log 
returns, We can eliminate the need for this adjustment by rewriting the 
equation in terms of the log of the expected ratio of gross returns: 


log ELA + Runt Ry] = You. 


Equation (8.2.7) shows that risk premia are determined by the coefficient of 
relative risk aversion times covariance with consumption growth. Of course, 
we have already presented evidence against the implication of this model 
that risk premia are constant. Nevertheless we explore the model as a useful 
way to develop intuition and understand econometric techniques used in 
more general models. 


A Second Look at the Equity Premium Puzzle 

Equation (8.2.7) clarifies the argument of Mehra and Prescott (1985) that 
the equity premium is too high to be consistent with observed consumption 
behavior unless investors are extremely risk averse. Mehra and Prescott's 
analysis is complicated by the fact that they do not use observed stock re- 
turns, but instead calculate stock returns implied by the (counterfactual) 
assumption that stock dividends equal consumption. Problem 8.2 carries 
out a loglinear version of this calculation. 

One can appreciate the equity premium puzzle more directly by ex- 
amining the moments of log real stock and commercial paper returns and 
consumption growth shown in Table 8.1. The asset returns are measured 
annually over the period 1889 to 1994. "The mean excess log return of stocks 
over commercial paper is 4.2% with a standard deviation of 17.7%; using 
the formula for the mean of a lognormal random variable, this implies that 
the mean excess simple return is 6% as stated earlier. 

As is conventional in the literature, the consumption measure used in 
Table 8.1 is consumption of nondurables and services. The covariance of 
the excess log stock return with log consumption growth is Uie correlation 
of the two series, times the standard deviation of the Jog stock return, times 
the standard deviation of log consumption growth. Because consumption 
of nondurables and services is a smooth series, log consumption growth has 


“This implicitly assumes that utility is separable actoss this form of consumption and other 
sources of utility. In Section 8.4 we discuss ways in which this assumption can be relaxed. 
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Table d. 1. Moments of consumption growth and asset returns, 


Correlation with (variance with 


Variable Mean Standard consumption consumption 
devijon growth growth 
Consumption growth 0.00172. 0,0528 1.0000 0.0011 
Stock return 0.060! 0.1674 0.4902 0.0027 
CP return 0.0183 0.05441 —(.1157 —0.0002 
Stock-CP return 0.0118 01774 0.4070 0.0020 


Consumption growth is the huge in log real coustuption of nondurables and services, The 
Mock return is the Tog real return on the S&P 500 index since 1926, and the return on a 
comparable index (om Grossman and Shiller (1981) before 1926. CP is the real return on 


mouth commercial paper, bought in January and rolled over in july. All data are annual, 
eh to 1994, 


амп standard deviation of only 3.3%; hence the excess stock return has a 
covariance with log consumption growth of only 0.003 despite the fact that 
the correlation of the two series is about 0.5. Substituting the moments in 
"Table 8.1 into (8.2.7) shows that a risk-aversion coefficient of 19 is required 
to fit the equity premium.“ This is much greater than 10, the maximum 
value considered plausible by Mehra aud Prescott. 

It is worth noting that the implied risk-aversion coefficient is sensitive 
to the timing convention used for consumption, While asset prices are 
measured at the end of cach period, consumption is measured as a flow 
during а period. In correlating asset returns with consumption growth 
one can assume that measured consumption represents beginning-of-period 
consumption or end-of-period consumption, The former assumption leads 
one to correlate the return over a period with the ratio of the next period's 
consumption to this period's consumption, while the latter assumption leads 
one to correlate the return over a period with the ratio of this period's 
consumption to last period's consumption. The former assumption, which 
we use here, gives а much higher correlation between asset returns and 
consumption growth and hence a lower risk-aversion coefficient than the 
latter assumption,” 


“table Bd reports the momen ol asset returmsand consumption growth whereas equation 
(8.2.7) requises The moments of innovations in these series; However the variation in condi- 
пота expected тенип and consumption growth scents to be small enough thar tlie moments 
ol innovations are simili to tlie inontents af the raw series, 

7 Grossman, Melino, aud Shiler (E087) handle this problem more caretully by assuming 
an underlving continuous-time model and deriving its implications for time-averaged data. 
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nathan (1991) in the following way. In the representative-agent model 
with power utility, the stochastic discount factor Мр = & (C/ C0, 
and the log stochastic discount factor my = log(6) — y Ас. If we are 
willing to make the approximation mı œ М, — 1, which will be accu- 
rate if М has a mean close to one and is not too variable, then we have 
Var[ Mii] ~ Var[mi4] = у? Var[Ac+1]. Equivalently, the standard devi- 
ation of the stochastic discount factor must be approximately the coefficient 
of relative risk aversion times the standard deviation of consumption growth. 
Using the Hansen-Jagannathan methodology we found that the standard de- 
viation of the stochastic discount factor must be at least 0.33 to fit our annual 
stock market data. Since the standard deviation of consumption growth is 
0.033, this by itself implies a coefficient of risk aversion of at least 0.33/0.033 
= 10. Buta coefficient of risk aversion this low is consistent with the data. 
only if stock returns and consumption are perfectly correlated. If we use: 
the fact that the empirical correlation is about 0.5, we can use the tighter | 
volatility bound in equation (8.1.15) to double the required standard devi- | 


ation of the stochastic discount factor and hence double the risk-aversion 
coefficient to about 20. : 


The Riskfree Rate Puzzle 
One response to the equity premium puzzle is to consider larger values for 
the coefficient of relative risk aversion y. Kandel and Stambaugh (1991) | 
have advocated this.“ However this leads to a second puzzle. Equation | 
(8.2.5) implies that the unconditional mean riskless interest rate is | 
202 

Ely] = log õ + = . (8.2.8) 
where g is the mean growth rate of consumption. The average riskless in- 
terest rate is determined by threc factors. First, the riskless rate is high if | 
the time preference rate — Іор ò is high. Second, the riskless rate is high if, 
the average consumption growth rate g is high, for then the representative 
agent has an incentive to try to borrow to reduce the discrepancy between 
consumption today and in the future. The strength of this effect is inversely 
proportional to the elasticity of intertemporal substitution in consumption; 
in a power utility model where risk aversion equals the reciprocal of in- 
tertemporal substitution, the strength of the effect is directly proportional 
to y. Finally, the riskless rate is low if the variance of consumption growth 
is high, for then the representative agent has a precautionary motive for 


*One might think that introspection would be sufficient to rule out very large values of y. 
However Kandel and Stambaugh (1991) point out that introspection can deliver very different 
estimates of risk aversion depending on the size of the gamble considered. This suggests that 
introspection can be misleading or that some more general model of utility is needed. 
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saving. The strength of this precautionary saving effect is proportional to 
the square of risk aversion, y?. 

Given the historical average short-term real interest rate of 1.8%, the 
historical average consumption growth rate of 1.8%, and the historical av- 
erage’ standard deviation of consumption growth of 3.3% shown in Table 
8.1, a y of 19 implies a discount factor 8 of 1.12; this is greater than one, 
corresponding to a negative rate of time preference. Weil (1989) calls this 
the riskfree rate puzzle. Intuitively, the puzzle is that if investors are extremely 
risk-averse (y is large), then with power utility they must also be extremely 
unwilling to substitute intertemporally (¥ is small). Given positive average 
consumption growth, a low riskless interest rate and a positive rate of time 
preference, such investors would have a strong desire to borrow from the 
future. A low riskless interest rate is possible in equilibrium only if investors 
have a negative rate of time preference that reduces their desire to borrow. 

Of course, these calculations depend on the exact moments given in Ta- 
ble 8.1. In some data sets an even larger coefficient of relative risk aversion 
is needed to fit the equity premium: Kandel and Stambaugh (1991), for 
example, consider a risk-aversion coefficient of 29. With risk aversion this 
large, the precautionary savings term —y?02/2 in equation (8.2.8) reduces 
the equilibrium riskfree rate and so Kandel and Stambaugh do not need a 
negative rate of time preference to fit the riskfree rate. A visual impression 
oof this effect is given in Figure 8.3, which shows the mean stochastic dis- 
count factor first decreasing and then increasing as y increases with a fixed 
5. Since the riskless interest rate is the reciprocal of the mean stochastic 
discount factor, this implies that the riskless interest rate first increases and 
then decreases with y. The behavior of the riskless interest rate is always a 
problem for models with high y, however as the interest rate is extremely 

ensitive to the parameters g and о? and reasonable values of the interest 
rate are achieved only as a kniſe-edge case when the effects of g and о? 
Imost exactly offset cach other. 


s the Equity Premium Puzzle a Robust Phenomenon? 

other response to the equity premium puzzle is to argue that it is an 
urtefact of the particular data set on US stock returns. While we have not 
eported standard errors for risk-aversion estimates, careful empirical re- 
earch by Cecchetti, Lam, and Mark (1994), Kocherlakota (1996), and oth- 
rs shows that the data can reject risk-aversion cocfficients lower than about 
0 using standard statistical methods. However, the validity of these tests 
depends on the characteristics of the data set in which they are used. 

Rietz (1988) has argued that there may be a peso problem in these data, A 
peso problem arises when there is a small positive probability ofan important 
event, and investors take this probability into account when setting market 
prices. If the event does not occur in a particular sample period, investors 
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may appear to be irrational in the sample. While it may seem implausible 
that this could be an important problem in 100 years of data, Rictz (1988) 
argues that an economic catastrophe that destroys almost all stock-market 
value can be extremely unlikely and yet have a major depressing effect on 
stock prices. 

A related point has been made by Brown, Goctzinann, and Ross (1995). 
These authors argue that financial economists concentrate on the US stock 
market precisely because it has survived and grown to become the world's 
largest market. In some other markets, such as those of Russia, investors 
have had all their wealth expropriated during the last 100 years and so there 
is no continuous record of market prices; in others, such as the Argentine 
market, returns have been so poor that today these markets are regarded as 
comparatively less important emerging markets. If this survivorship effect 
is important, estimates of average US stock returns are biased upwards. 

Although these points have some validity, they are unlikely to be the 
whole explanation for the equity premium puzzle. The difficulty with the 
Rictz (1988) argumentis that it requires not only an economic catastrophe, 
but one which affects stock market investors more seriously than investors 
in short-term debt instruments. The Russian example suggests that a catas- 
trophe causes very low returns on debt as well as on equity, in which case 
the peso problem affects estimates of the average levels of returns but not 
estimates of the equity premium, Also, there seems to be a surprisingly large 
equity premium not only in the last 100 ycars of US data, but also in US data 
from earlier in the 19th Century as shown by Siegel (1904) and in data from 
other countries as shown by Campbell (19960), 


Time-Variation in Expected Asset Returns and Consumption Growth 

Equation (8.2.5) gives a relation between rational expectations of asset re- 
turns and rational expectations of consumption growth. It implies that 
expected asset returns are perfectly correlated with expected consumption 
growth, but the standard deviation of expected asset returns is y. times as 
large as the standard deviation of expected consumption growth. Equiva- 
lendly, the standard deviation of expected consumption growth is y = I/y 
umes as large as the standard deviation of expected asset returns, 

This suggests an alternative way to estimate y or y. Hansen and Single- 
ton (1983), followed by Hall (1988) and others, have proposed instrumental 
variables (IV) regression as a way to approach the problem. If we define an 
error term 2,544 E 7,41 — Belfi) = y Ana = Bel Scr, D, then we can 
rewrite equation (8.2.5) as a regression equation, 


пе = Mit YAGI tA uaa (8.2.9) 


In general the error term 5,544 will be correlated with realized consumption 
growth so OLS is not an appropriate estimation method. However is 
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Table &.2. Ii Iͤ variables i for returns and convumption growth. 


(8.2.9 4,4 = JG E YAni that 


(8.2.10) Au = tb N] 


Денин Virststage regressions y ý Test Test 
(instruments) ў A. (5. c.) (S. C.) (8.2.9) — (8.2.10) 

Commercial paper 6.275 0.031 — 1.981 — —0,088 0.106 0.078 
(1) (4000) (0.185) (1.318) (0. 113) (000 (0,2254) 

Stock indes 0.080 0.054 6.365  —0,100 0.008 0.007 
(t) (0.071) (0.185) (5.424) (0.001) (0.673) (0.705) 

Commercial paper 0.297 0.102 —0.053 018 0.221 0.001 
(1 and 2) (0.000) (0,15) (0.567) (0.109) — (0.000) (0.090) 
Stock index 0.110 0.102 -0.235 —0.008 0.105 0.007 
(1 and 2) (0.105) (0.145) (1.650) (0.059) (0.056) (0.075) 


— 4 M . — À— À—— M — 


Log ted cee growil tates and asser returns are measured in annual US data, 1889 
t0 P001. The colis beaded "Firststage regressions” report the R? statistics A joint signifi- 
alice level ob the platens variables in regressions of returus and consumption growth on 
the instruments, The columns heated j aud d report two-stage Beast squares imstiumental- 
variables (IV) estimates of the parameters y aud gri regressions (8.2.9) and (8.2.10) respec- 
tively The columna headed “fesi (K. 2. ) and "Test (N. 2. 10)“ report the IU statistics ana joint 
significance levels ob the explanatory variables in regressions of IV regression residuals (8.2 0) 
and (8.2.10) ou the instrament. The instimments inchide either one lag (n rows marked 
1), or one and two lags (Gn vows marked 1 and 2) of the real commercial paper rate, the real 
consimmiption growth rate, and the tog dividend price ratio, 


инсон екше with any variables ii the information set at time /. Hence any 
lagged variables correlated with asset returns can be used as instruments in 
an IV regression to estimate the coefficient of relative risk aversion y. 

"able 8,2 Wlustiutes two-stage least squares estimation of (8.2.9), In this 
table the asset returns are the real commercial paper rate and real stock 
return from Table 8.1, and consumption growth is the annual growth rate 
of real nondurables and services consumption, The instruments are either 
one lag, or one and two lags, of the real commercial paper vate, the real 
consumption growth rate, and the log dividend price ratio. 

For cach asset and set of instruments, the table first reports the R? 
statistics and significance levels for firststage regressions of the asset return 
and consumption growth rate onto the instruments. The table then shows 
the IV. estimate of y with its standard error, and—in the column headed 
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"Test (8.2.9)"—the R? statistic for a regression of the residual on the in- 
struments together with the associated significance level of a test of the | 
over-identifying restrictions of the model. This test is discussed at the end 
of Section А.І of the Appendix. j 

Table 8.2 shows strong evidence that the real commercial paper rate is 
forecastable, and weaker evidence that the real stock return is forecastable. ; 
There is very little evidence that consumption growth is forecastable.? The 
IV estimates of y are negative rather than positive as implied by the underly-; 
ing theory, but they are not significantly different from zero. The overiden-: 
tifying restrictions of the model are strongly rejected when the commercial 
paper rate is used as the asset. b 

One problem with IV estimation of (8.2.9) is that the instruments are 
only very weakly correlated with the regressor because consumption growth 
is hard to forecast in this data set. Nelson and Startz (1990) have shown 
that in this situation asymptotic theory can be a poor guide to inference 
in finite samples; the asymptotic standard error of the coefficient tends 
to be too small and the overidentifying restrictions of the model may be 
rejected even when it is true. To circumvent this problem, one can reverse 
the regression (8.2.9) and estimate 


Авы = Cit Vra + биж. (8.2.10) 


If the orthogonality conditions hold, then as we have already discussed the 
estimate of y in (8.2.10) will asymptotically be the reciprocal of the estimate 
of y in (8.2.9). In a finite sample, however, if y is large and ¥ is small then 
IV estimates of (8.2.10) will be better behaved than IV estimates of (8.2.9). 

In Table 8.2 y is estimated to be negative, like y, but is small and in- 
significantly different from zero. The overidentifying restrictions of the 
model ("Test (8.2.10)") are not rejected when only 1 lag of the instruments 
is used, and they are rejected at the 10% level when 2 lags of the instruments 
are used. Table 8.2 also shows that the residual from the IV regression is 
only marginally less forecastable than consumption growth itself. These 
results are not particularly encouraging for the consumption model, but 
equally they do not provide strong evidence against the view that investors 
have power utility with a very high y (which would explain the equity pre- 
mium puzzle) and a correspondingly small y. (which would explain the 


unpredictability of consumption growth in the face of predictable asset re- 
turns). 


| 

Hin postwar quarterly data there is stronger evidence of predictable variation in consump: 
tion growth, Campbell and Mankiw (1990) show that this variation is associated with predictable 
income growth. : 
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8.2.2 Power Utility and Generalized Method of Moments 


So far we have worked with a restrictive loglinear specification and have 
discussed cross-sectional and time-series aspects of the data separately. The 
Generalized Method of Moments (GMM) of Hansen (1982), applied to the 
consumption CAPM by Hansen and Singleton (1982), allows us to estimate 
and test the power utility model without making distributional assumptions 
and without ignoring either dimension of the data. Section A.2 of the 
Appendix summarizes the GMM approach, and explains its relation to linear 
instrumental variables. 

When GMM is used to estimate the consumption CAPM with power util- 
ity, using the same asset returns and instruments as in Table 8.2 and assuming 
white noise errors, the overidentifying restrictions of the model are strongly 
rejected whenever stocks and commercial paper are included together in 
the system. The weak evidence against the model in Table 8.2 becomes much 
stronger. This occurs because there is predictable variation in excess returns 

| on stocks over commercial paper, Such predictable variation is ruled out 
| by the loglinear homoskedastic model (8.2.5) but could in principle be сх- 
| Plained by a heteroskedastic model in which conditional covariances of S 
| set returns with consumption arc correlated with the forecasting variables." 
| The GMM system allows for this possibility, without lincarizing the model 
or imposing distributional assumptions, so the GMM rejection is powerful 
evidence against the standard consumption CAPM with power utility. 

Faced with this evidence, economists have explored two main directions 
for rescarch. A first possibility is that market frictions invalidate the standard 
consumption CAPM. The measured returns used to test the model may 
not actually be available to investors, who may face transactions costs and 
constraints on their ability to borrow or shortsell assets. Market frictions may 
also make aggregate consumption an inadequate proxy for the consumption 
of stock market investors. A second possibility is that investors have more 
complicated preferences than the simple power specification. We explore 

| cach of these possibilities in the next two sections, 


8.3 Market Frictions 


We now consider various market frictions that may be relevant for asset 
pricing. If investors face transactions costs or limits on their ability to borrow 


“Recall that in Chapter 7 we presented evidence that the dividend-price ratio forecasts 
excess stock returns. The dividend-price ratio is one of the instruments used heic. 

"One can understand this by considering a heteroskedastic version of the linearized model 
(8.2.5) in which the variances have time subscripts. Campbell (1987) and Harvey (1989) apply 
GMM to models of this type which impose the restriction that asset returns’ conditional means 


are linear functions of their conditional second moments. We discuss this work further in 
Chapter 12. 


8.3. Market Frictions - 315 


or sell assets short, then they may have only a limited ability to exploit the 
empirical patterns in returns. In Section 8.5.0 we show how this can alter 
the basic Hansen and Jagannathan (1991) analysis of the volatility of the 
stochastic discount factor. 

The same sorts of frictions may make aggregate consumption an inadc- 
quate proxy for the consumption of stock market investors. In Section 8.3.2 
wc discuss some of the evidence on this point, and then follow Campbell 
(19932, 1996) in developing a representative-agent asset pricing theory in 
which the consumption of the representative investor need not be observed. 
The theory uses a generalization of power utility, due to Epstein and Zin 
(1989, 1991) and Weil (1989), that.breaks the link between risk aversion 
and intertemporal substitution. The resulting model, in the spirit of Mer- 
ton (19734), is a multifactor model with restrictions on the risk prices of the 


factors; hence it can be tested using the econometric methods discussed in 
Chapter 6. 


8.3.1 Market Frictions and Hansen-Jagannathan Bounds 


The volatility bounds of Hansen and Jagannathan (1991), discussed in Sec- 
tion 8.1.1, assume that investors can freely (rade in all assets without in- 
curing transactions costs and without limitations on borrowing or short 
sales. These assumptions are obviously rather extreme, but they have been 
relaxed by He and Modest (1995) and Lutamer (1994). To understand the 
approach, note that if asset ¿cannot be sold short, then the standard equality 
restriction ЕА RAG) = must be replaced by an inequality restriction 


EE % A, < t. (8.3.1) 


If the inequality is strict, then an investor would like to sell the asset but is 
prevented from doing so by the shortsales constraint. Instead, the investor 
holds a zero position in the asset. 

Shortsales constraints inay apply to some assets but not others; if they 
apply to all assets, then they can be interpreted as a solvency constraint, in that 
they ensure that an investor cannot make investments today that deliver neg- 
ative wealth tomorrow. Assuming limited liability for all risky assets, so that 
the minimum value of cach asset tomorrow is zero, a portfolio with nonneg- 
ative weights in every asset also has a minimum value of zero tomorrow. 

Investors may also face borrowing constraints that limit their ability to 
sell assets to finance consumption today. Such constraints deliver inequality 
restrictions of the form (8.3.1) for all raw asset returns, but the standard 
equality constraint holds for excess returns since the investor is free to short 
one asset in order to take a long position in another asset. 

Shortsales constraints can also be used to model proportional transac- 
tions costs of the type that might result from a bid-ask spread that docs not 
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depend on the size ola trade; When there are transactions costs, the after- 
transaction-cast return on an asset bought today and sold tomorrow is not 
the negative of the after-transaction-cost return on the same asset sold today 
and bought back tomorrow, These two returns can be measured separately 
and can both be included in the set of returns if they are made subject to 
shartsales constraints. 

In the presence of shortsales constraints, the vector equality (8.1.8) is 
replaced by another vector equality 


D = Fle RH A,]. (8.3.2) 


where @ is an unknown vector, The model implies various restrictions on 
0 such as the restriction that %, < 1 for all i. Volatility bounds can now 
be found for each M by choosing, subject to the restrictions, the value of 
0 that delivers the lowest variance for Me (М). He and Modest (1995) find 
that by combining borrowing constraints, a restriction on the short sale of 
Treasury bills, and assetspecilic transaction costs they can greatly reduce 
the volatility bound on the stochastic discount factor. 

This analysis is extremely conservative in that is chosen to minimize the 
volatility bound without asking what underlying equilibrium would support 
this choice for 0. It here are substantial transactions costs, for example, 
then even risk-neutral traders will not sell one asset to buy another asset 
with a higher return unless the return difference exceeds the transactions 
costs. But the one-period transaction costs will not be relevant if traders 
can buy the higlrreturn asset and hold it for many periods, or if a trader 
has new wealth to invest and must pay the cost of purchasing one asset or 
the other, Thus the work of He and Modest (1995) and Luttmer (1994) 
is exploratory, a way to get a sense for the extent to which market frictions 
loosen the bounds implied by a frictionless market. 

Some authors have tried 10 solve explicitly for the asset prices that are 
implied by equilibrium models with transactions costs. This is a difficult 
task because transactions costs make the investor's decision problem com- 
paratively intractable except in very special cases (sce Davis and Norman 
(1990)). Aiyapari and Gertler (1991), Amihud and Mendelson (1986), 
Constantinides (1986), Heaton and Lucas (1996), and Vayanos (1995) have 
begun to make some progress on this topic. 


5.3.2 Market Frictions aud Aggregate Consumption Data 


The rejection of the standard consumption CAPM may be due in part to dif- 
liculties in measuring aggregate consumption. The consumption CAPM ap- 
ples to true consumption measured ata point in time, but the available data 
are time-iggregatec and measured with error. Wilcox (1999) describes the 
sampling procedures used to construct consumption data, while Grossman, 
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speaking, these data Roe can cause asset returns weighted by егей" 
marginal utility of consumption, (1+ R, 1.1) (С, / G, io be ſorecastable 
in the short run but not the long run. Thus one can allow for such problems 
by lagging the instruments more than one period when testing the rhodel.!? 
Doing this naturally weakens the evidence against the consumption CAPM, 

but the model is still rejected at conventional significance levels unless very 
long lags are used. | 

A more radical suggestion is that aggregate consumption is not an ad- 
equate proxy for the consumption of stock market investors even in the 
long run. One simple explanation is that there are two types of agents in 
the economy: constrained agents who are prevented from trading in asset 
markets and simply consume their labor income each period, and uncon- 
strained agents. The consumption of the constrained agents is irrelevantta 
the determination of equilibrium asset prices, but it may be a large fraction 
of aggregate consumption. Campbell and Mankiw (1990) argue that pre- 
dictable variation in consumption growth, correlated with predictable vari- 
ation in income growth, suggests an important role for constrained agents, 
while Mankiw and Zeldes (1991) use panel data to show that the eet 
tion of stockholders is more volatile and more highly correlated with the 
stock market than the consumption of nonstockholders. 

The constrained agents in the above model do not directly influence 
asset prices, because they are assumed not to hold or trade financial assets. 
Another strand of the literature argues that there may be same investors who 
buy and sell stocks for exogenous, perhaps psychological reasons. These 
noise traders can influence stock prices because other investors, who are 
rational utility-maximizers, must be induced to accommodate their shifts 
in demand. If utilitymaximizing investors are risk-averse, then they will 
only buy stocks from noise traders who wish to sell if stock prices fall and 
expected stock returns rise; conversely they will only sell stocks to naise 
traders who wish to buy if stock prices rise and expected stock returns fall. 
Campbell and Kyle (1993), Cutler, Poterba, and Summers (1991), DeLong 
etal. (1990a, 1990b), and Shiller (1984) develop this model in some detail. 
The model implies that rational investors do not hold the market portfolio— 
instead they shift in and out of the stock market in response to changing 
demand from noise traders—and do not consume aggregate consumption 
since some consumption is accounted for by noise traders. This makes the 


"n Sunpbell and Mankiw (1990) discuss this in the context of a linearized model. Breeden, 


Gibbons, and. Litzenberger (1989) make a related point, arguing that at short horizons one 
should replace consuinpüon with the return on a portfolio constructed to be highly correlated 
with longerrun movements in consumption, Brainard, Nelson, and Shapiro (1991) find that 
the consumption CAPM works better at long horizons than at short horizons. 
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model hard to test without having detailed information on the investment 
strategies of different market participants. 
It is also possible that utility-maximizing stock market investors are het- 
erogencous in important ways. If investors are subject to large idiosyncratic 
risks in their labor income and can share these risks only indirectly by trading 
Ya few assets such as stocks and Treasury bills, their individual consumption 

paths may be much more volatile than aggregate consumption. Even if indi- 
dual investors have the same powcr utility function, so that any individual's 
consumption growth rate raised to the power —y would be a valid stochastic 
«discount factor, the aggregate consumption growth rate raised to the power 
—у may not be a valid stochastic discount factor. Problem 8.3, based on 
Mankiw (1986), explores this effect in a simple two-period model. 

Recent rescarch has begun to explore the empirical relevance of im- 
perfect riskssharing for asset pricing. Heaton and Lucas (1996) calibrate 
individual i income processes to micro data from the Panel Study of Income 
1 95 (SID). Because the PSID data show that idiosyncratic income 

ariation is largely transitory, Heaton and Lucas find that investors can min- 
mize its cIfects on their consumption by borrowing and lending. Thus they 
ind only limited effects on asset pricing unless they restrict borrowing or 
ssume the presence of large transactions costs. Constantinides and Duffie 
1996) construct a theoretical model in which idiosyncratic shocks have per- 
manent effects on income; they show that this type of income risk can have 
large effects on asset pricing. 
| Given this evidence, it seems important to develop empirically testable 
intertemporal asset pricing models tliat do not rely so heavily on aggregate 
consumption data. One approach is to substitute consumption out of the 
consumption CAPM to obtain an asset pricing model that relates mean 
returns to covariances with the underlying state variables that determine 
consumption. The strategy is to try to characterize the preferences that 
au investor would have to have in order to be willing to buy and hold the 
aggregate wealth portfolio, without necessarily assuming that this investor 
also consumes aggregate consumption. 

There are several classic asset pricing models of this type set in con- 
tinuous time, most notably Cox, Ingersoll, and Ross (1985a) and Merton 
(1973a). But those models are hard to work with empirically. Campbell 
(1993a) suggests a simple way to get an empirically tractable discrete-time 


This is an example of Jensen’s Inequality. Since marginal utility is nonlinear, the average 
of investors’ marginal utilities of consumption is not generally the same as the marginal utility 
of average consumption. This problem disappears when investors’ individual consumption 
streams are perfectly correlated with one another as they will be in а complete markets setting. 
Grossman and Shiller (1982) point out that it also disappears in a continuous-time model when 
the processes for individual consumption streams and asset prices are diffusions. 
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model using the utility specification developed by Epstein aud Zin (1989, 
1991) and Weil (1989), which we now sunumarize. 


Separating Risk Aversion and Intertemporal Substitution 
Epstein, Zin, and Weil build ou the approach of Keeps aud Porteus (1978) to 
develop a more flexible version of the basic power utility model. That model 
is restrictive in that it makes the elasticity of intertemporal substitution, y, 
the reciprocal of the coefficient of relative risk aversion, y. Yet itis not clear 
that these two concepts should be linked so tightly. Risk aversion describes 
the consumer's reluctance to substitute consumption across states of the 
world and is meaningful even in an atemporal setting, whereas the elasticity 
of intertemporal substitution describes the consumer's willingness 10 sub- 
stitute consumption over time and is meaningful even in a deterministic 
setting. The Epstein-Zin-Weil model retains many of the attractive features 
of power utility but breaks the link between the parameters y and y. 

The Epstein-Zin-Weil objective function is defined recursively by 


U = u- Г +(e LUY ) | F (8.3.3) 


where = (-N / 1). When @ = lihe recursion (8.3.3) becomes 
linear; it can then be solved forward to yield the familiar time-separable 
power utility model. 
The intertemporal budget constraint for a representative agent can be 
written as 
Wer = OG Rarer) (OM = Со), (8.3.4) 


where Wiz; is the representative agent's wealth, and (1+ fy, ¢) is the return 
on the “market” portfolio of all invested wealth. This form of the budget 
constraint is appropriate for a completeanarkets model in which wealth 
includes human capital as well as financial assets. Epstein and Zin usc 
a complicated dynamic programming argument to show that (8.3.3) and 
(8.3.4) together imply an Euler equation of the foni! 


EE S (St) Ta * (L+ ua) (8.3.5) 
E C, TET e uid 


lf we assume that asset returns and consumption are homoskedastic and 
jointly lognormal, then this implies that the riskless real interest rate is 
0-1,0 


ЖО Я aq 
Haad = ~logd+ 5 a7 YE an + » Ej Anl. (8.3.6) 


there are in fact typos in equations (10) though (12) of Epstein and Zin (1991) which 
give intermediate steps in the derivation. 


420 8. Intertemporal Equilibrium Models 


The premium on risky assets, including the market portfolio itself, is 


2 


. a; On А 
Era | UATE + > = 0 Y t (1 2 Ө)с, ы. (8.3.7) 


This says that the risk premium on asset i is a weighted combination of 
asset Г covariance with consumption growth (divided by the elasticity of in- 
lertemporal substitution y) and asset “s covariance with the market return, 
The weights are 0 and I — 0 respectively, The Epstein-Zin-Weil model thus 
nests the consumption CAPM with power utility (0 = 1) and the traditional 
static CAPM (9. = 0). 

lt is tempting to use (8.3.7) together with observed data on aggregate 
consumption and stock market returns to estimate and test the Epstein- 
Jin Weil model. Epstein and Zin (1991) report results of this type. In a 
similar spirit, Giovannini and Weil (1989) use the model to reinterpret the 
results of Mankiw and Shapiro (1986), who found that betas with the market 
have greater explanatory power for the cross-sectional pattern of returns 
than do betas with consumption; this is consistent with a value of 0 close 
to zero. However this procedure ignores the fact that the intertemporal 
budget constraint (8.3.4) also links consumption and market returns, We 
now show that the budget constraint can he used to substitute consumption 
out of the asset pricing model, 


Substituling Consumption Out of the Model 
Campbell (10932) points out that one can loglinearize the intertemporal 


budget constraint. (8:34) around the mean log consumption-wealth ratio 
to obtain 


И 
Аш © wt tk F ( — =) (e = иң), (8.3.8) 

p 
where р = l = expe- w) and kisa constant that plays no role in what 
follows. Combining this with the trivial equality Аш = Acer = AU = 


иң), solving the resulting difference equation forward, and taking expec- 
tations, we can write the budget constraint in the form 


n 
k 
equa m Y, ) P? (wrt, 7 A) | + ix (8.3.0) 
py 


This equation says that if the consumption-wealth ratio is high, then the 
agent must expect either high returns on wealth in the future or low con- 
sumption growth rates. This follows just from the approximate budget con- 
straint without imposing any behavioral assumptions. His directly analogous 
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to the linearized formula for the log dividend-price ratio in Chapter 7. Here 
wealth can be thought of as an asset that pays consumption as its dividend, 

If we now combine the budget constraint (8.3.9) with the loglinear Euler 
equations for the Epstein-Zin-Weil model, (8.3.6) and (8.3.7), we obtain a 
closed-form solution for consumption relative to wealth: 


oo 


| k -u. 
аш = (1 — YE, | Y p!) nas; (PER, (8.3.10) 
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Here Hn is a constant related to the conditional variances of consump- 
tion growth and the market portfolio return. The log consumption-wealth 
ratio is a constant, plus (1 — v) times the discounted value of expected 
future returns on invested wealth. If ¥ is less than one, the consumer 
is reluctant to substitute intertemporally and the income effect of higher 
returns dominates the substitution effect, raising today's consumption rel- 
ative to wealth. If y is greater than one, the substitution effect dominates 
and the consumption-wealth ratio falls when expected returns rise, Thus 
(8.3.10) extends to a dynamic context the classic comparative statics results 
of Samuelson (1969). 
(8.3.10) implies that the innovation in consumption is 


Ge Valeo] = Fmt ты] (8.3.11) 
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An unexpected return on invested wealth has a one-for-one effect on con- 
sumption, no matter what the parameters of the utility function: This fol- 
lows from the scale independence of the objective function (8.3.3). An 
increase in expected future returns raises or lowers consumption depend- | 
ing on whether ¥ is greater or less than one. Equation (8.3.11) also shows ` 
when consumption will be smoother than the return on the market. When | 


the market return is mean-reverting, there is a negative correlation between · 


current returns and revisions in expectations of future returns, This reduces | 
the variability of consuinption growth if the elasticity of intertemporal sub- | 


stitution y is less than one but amplifies it if y is greater than one. 
Equation (8.3.11) implies that the covariance of any asset return with 
consumption growth can be rewritten in terms of covariances with the re- 


PCampbell (19932) and Campbell and Koo (1996) explore the accuracy of the loglinear 
approximation in this context by comparing the approximate analytical solution for optimal 
consumption with a numerical solution. lu an example calibrated to US stock market data, 


the two solutions are close together provided that the investor's elasticity of intertemporal 
substitution is less than about 3. 
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urn on the market and revisions in expectations of future returns on the 
market: 


| 
| Cov, [ier Aas} = 0. = б + (1 x Youn, u (8.3.12) 


where 


j=l =1 


n is defined to be the covariance of the return on asset i with "news" 
about future returns on the market, ie., revisions in expected future re- 
urns, 

Substituting (8.3.12) into (8.3.7) and using the definition of in terms 
of the underlying parameters о and y, we obtain a cross-sectional asset 
pricing formula that makes no reference to consumption: 


^o со 
i = Cove | rii Eei y. Tattle — E. Y Tm, tA j . (8.3.13) 
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Eb] = ari SE = усы + (Y Don. (8.3.14) 
Equation (8.3.14) has several striking features. First, assets can be priccd 
without direct reference to their covariance with consumption growth, us- 
ing instead their covariances with the return on invested wealth and with 
news about future returns on invested wealth. This is a discrete-time ana- 
logue of Merton's (1973a) continuous-time model in which assets are priced 
using their covariances with certain hedge portfolios that index changes in the 
investment opportunity set. 

Second, the only parameter of the utility function that enters (8.3.14) 
is the coefficient of relative risk aversion y. The clasticity of intertemporal 
substitution ¥ does not appear once consumption has been substituted out 
of the model. This is in striking contrast with the important role played by 
y in the consumption-based Euler equation (8.3.7). Intuitively, this result 
comes from the fact that ¥ plays two roles in the theory. A low value of 
y reduces anticipated fluctuations in consumption, but it also increascs 
the risk premium required to compensate for any contribution to these 
fluctuations. These offsetting effects lead ¥ to cancel out of the asset-based 
pricing formula (8.3.14). 

Third, (8.3.14) expresses the risk premium, net of the Jensen's Inequal- 
ity adjustment, as a weighted sum of two terms. The first term is the assct's 
covariance with the market portfolio; the weight on this term is the cocffi- 
cient of relative risk aversion y. The second term is the asset's covariance 
with news about future returns on the market; this receives a weight of y — 1. 
When y is tess than one, assets that do well when there is good news about 
fiure returns ou the market have lower mean returns, but when y is greater 


` 
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than one, such assets have higher mean returns, The intuitive explanation 
is that such assets are desirable because they enable the consumer to profit 
from improved investment opportunities, but undesirable because they re- 
duce the consumer's ability to hedge against a deterioration in investment 
opportunities. When y < 1 the former effect dominates, and consumers 
arc willing to accept a lower return in order to hold assets that pay off when 
wealth is most productive. When y > 1 the latter effect dominates, and 
consumers require a higher return to hold such assets, 

There are several possible circumstances under which assets can be 
priced using only their covariances with the return on the market portfolio, 
as in the logarithmic version of the static CAPM. These cases have been dis- 
cussed in the literature on intertemporal asset pricing, but (8.3.14) makes it 
particulacly easy to understand them. First, if the coefficient of relative risk 
aversion y = L, then the opposing effects of covariance with investment 
opportunities cancel out so that only covariance with the market return is 
relevant for asset pricing. Second, if the investment opportunity set is con- 
stant, then o, is zero for all assets, so again assets can be priced using only 
their covariances with the market return. Third, if the return on the market 
follows a univariate stochastic process, then news about future returns is per- 
fectly correlated with the current return; thus, covariance with the curreut 
return is a sufficient statistic for covariance with news about future returns 
and can be used to price all assets. Campbell (19960) argues that the first 
two cases do not describe US data even approximately, but that the third 
casc is empirically relevant. 


A Third Look at the Equity Premium Puzzle 


(8.3.14) can be applied to the risk premium on the market itself. When ¿= 
m, we get 


2 
М Om 
Е. L 41] — уак + > = yo? + (у = Cun. (8.3.15) 


When the market return is unforecastable, there are no revisions of expec- 
tations in future returns, so Oma = 0. In this case the equity premium with 
the Jensen's Inequality adjustment is just yo}, and the coefficient of relative 
risk aversion can be estimated in the manner of Friend and Blume (1975) 
by taking the ratio of the equity premium to the variance of the market 
return, Using the numbers from Table 8.1, the estimate of risk aversion is 
0.0575/0.0315 = 1.828. This is the risk-aversion coefficient of an investor 
with power utility whose wealth is entirely invested in a portfolio with an 
unforecastable return, a risk premium of 5.7576 per ycar, and a variance 
of 0.0315 (standard deviation of 17.74% per усаг). The consumption of 
such an investor would also have a standard deviation of 17.74% per year. 
This is far greater than the volatility of measured aggregate consumption in 


ix 
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Table 8,1, which explains why the risk-aversion estimate is much lower than 
the consumption-based estimates discussed earlier. 

The Friend and Blume (1975) procedure can be seriously misleading if 
the market return is serially correlated. If high stock returns are associated 
with downward revisions of future returns, for example, then Om is negative 
in (8.3.15). With у > J. this reduces the equity risk premium associated 
with any level of y and increases the risk-aversion coefficient needed to 
explain a given equity premium. Intuitively when Oma < 0 the long-run 
risk of stock market investment is less than the short-run risk because the 
market tends to mean-revert. Investors with high y care about long-run risk 
rather than short-run risk, so the Friend and Bhime calculation overstates 
risk and correspondingly understates the risk aversion needed to justify the 
equity premium. 

Campbell (109963) shows that the estimated coefficient of relative risk 
aversion rises by a factor often or more if one allows for the empirically es- 
timated degree of mean-reversion in postwar monthly US data. In long-run 
атта US data the effect is less dramatic but still goes in the same direction. 
Gunpbell also shows that riskaversion estimates increase if one allows for 
human capital as a component of wealth, In this sense one can derive the 
equity premium puzzle without any direct reference to consumption data. 


An Equilibrium Multifactor Asset Pricing Madel 

With a few more assumptions, (8.3.14) can be used to derive an equilibrium 
multifactor asset pricing model of the type discussed in Chapter 6. We write 
the return on the market as the first element ofa K-element state vector x,44. 
The other elements are variables that are known to the market by the end 
of period 4+ and are relevant for forecasting future returns on the market 
We assume that the vector xpi follows a firs-order vector autoregression 


(VAR): 
X = Ax, + 6,61. (8.3.16) 


The assumption that the VAR is fiiscorder is not restrictive, since a higher- 
order VAR can always be stacked into first-order form. 

Next we define a A-clement vector el, whose first element is one and 
whose other elements are all zero, This vector picks out the real stock re- 
Urn Fy, Trom the vector xj 4: 7,441 = ex, and mat E = 
elen The fist-order VAR generates simple multiperiod forecasts of fu- 
ture reruns: 


V, ИЛЛЕ N = Alx, (8.3.17) 
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It follows that the discounted sum of revisions in forecast returns can be 
written as : 
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where S is defined to equal el“ pA — oA), a nonlinear function of the 
VAR coefficients. The elements of the vector ¢ measure the importance of 
each state variable in forecasting future returns on the market. Ifa particular 
element фу is large and positive, then a shock to variable A is an important 
piece of good news about future investment opportunities. 

We now define 


ощ = Сом Il. €i]. (8.3.19) 


where eie is the kth element of e1. Since the first element of the state 
vector is the return on the market, oj = im. Then (8.3.14) implies that 
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where 9, is the kth element of . This is a standard K-factor asset pricing 
model of the type discussed in Chapter 6. The contribution of the intertem- 
poral optimization problem is a set of restrictions on the risk prices of the 
factors. The first factor (the innovation in the market return) has a risk 
price of Ay = y + (у — Di. The sign of фі is the sign of the correlation 
between market return innovations and revisions in expected future market 
returns. As we have already discussed, this sign affects the risk price of the 
market factor; with a negative фу, for exainple, the market factor risk price 
is reduced if y is greater than one. : 

The other factors in this model have risk prices of Ag = (y — Dg 
fork > 1. Factors here are innovations in variables that help to forecast 
the return on the market, and their risk prices are proportional to their 
forecasting importance as measured by the elements of the vector H. If a 
particular variable has a positive value of фу, this means that innovations in 
that variable are associated with good news about future investment oppor- : 
tunities. Such a variable will have a negative risk price if the coefficient of ` 
relative risk aversion у is less than one, and a positive risk price if y is greater 
than onc. | 

Campbell (1996a) estimates this model on long-term annual and post- i 
World War П monthly US stock market data. He estimates фу to be negative 
and large in absolute value, so that the price of stock market risk A, is much | 
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smaller than the coefficient of risk aversion у. The other factors in the 
modcl have imprecisely estimated risk prices. Although some of these risk 
prices are substantial in magnitude, the other factors have minor ellects on 
the mean returns of the assets in tlie study, because these assets typieally 
have small covariances with the other factors. 


8.4 More General Utility Functions 


One straightforward response to the difficulties of the standard consump- 
tion CAPM is to generalize the utility function. We have already discussed 
the Epstein-Zin-Weil model, but there are other plausible ways to vary the 
utility specification while retaining the attractive scale-independence prop- 
erty of power utility. 

For example, the utility function may be nonseparable in consumption 
and some other good. ‘This is easy to handle in a loglinear model if utility is 
Cobb-Douglas, so that the marginal utility of consumption can be written as 


UMC, X) = COX” (8.4.1) 


for some good X, and parameter yz, The Euler equation now becomes 


` Сил D Xa cu 
|!zE tn —— Я 8.4.2) 
E, a . X (8.4.2) 


Assuming joint lognormality and homoskedasticity, this can be written as 
E[n] = шу Ас ye 5х+1]. (8.4.3) 


Eichenbaum, Hansen, and Singleton (1988) have considered a model of this 
form where X, is leisure. Aschauer (1985) and Startz (1989) have developed 
models in which X, is government spending and the stock of durable goods,. 
respectively. Unfortunately, none of these extra variables greatly improve 
the ability of the consumption CAPM to fit the data. The difficulty is that, 
at least in data since World War II, these variables are not noisy enough to 
have inuch effect on the intertemporal marginal rate of substitutions 


8.4.1 Habit Formation 


` 


j^ more promising variation of the basic model is to allow for nonseparabil- ` 
tity in utility over time. Constantinides (1990) and Sundaresan (1980) have 


' 

| Also, as Campbell and Mankiw (1990) point out, in postwar data there is predictable 
variation in consumption growth that is uncorrelated with predictable variation in real interest 
rates even after one allows for predictable variation in leisure, government spending, or durable 
goods, 
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argued for the importance of habit formation, a positive effect of today's con- 
sumption on tomorrow's marginal utility of consumption, Here we discuss 
some simple ways to implement this idea, 

Several modeling issues arise at the outset. We write the period util- 
ity function as U(C,, X), where X is the Gme-varying habit or subsistence 
level. The first issue is the functional form for UC). Abel (1990, 1996) has 
proposed that U(-) should be a power function of the ratio (% Xr, while 
Campbell and Cochrane (1995), Constantinides (1990), and Sundaresan 
(1989) have used a power function of the difference (% N. The second 
issue is the effect of an agents own decisions on Future levels of habit. In 
standard internal-habit models such as those in Constantinides (1990) and 
Sundaresan (1989), habit depends on an agents own consumption and 
the agent takes account of this when choosing how much to consume. In 
external-habit models such as those in Abel (1990, 1996) and Campbell and 
Cochrane (1995), habit depends on aggregate consumption which is unat- 
fected by any one agents decisions. Abel calls this catching up with the Joneses. 
The third issue is the speed with which habit reacts to individual or aggre- 
gate consumption. Abel (1990, 1996), Dunn and Singleton (1986), and 
Ferson and Constantinides (1991) make habit depend on onc lag of con- 
sumption, whereas Constantinides (1990), Sundaresan (1989), Campbell 
and Cochrane (1995), and Heaton (1995) make habit react only gradually 
to changes in consumption. 


Ratio Models 
Following Abel (1990, 1996), suppose that an agents utility can be written 
as a power function of the ratio G/ Ху, 


(Gal NR QUY = 
y Eee (8.4.4) 
1=0 


where X, summarizes the influence of past consumption levels on today's 
utility, X, can be specified as an internal habit oras an external habit, Using 
one lag of consumption for simplicity, we may have 


X= С. (8.4.5) 


the internal-habit specification where an agent's own past consumption mat 
ters, or 


Х = as (8.4.6) 


the extermalhabit specification where aggregate рам consumption Cr 
matters. Since there is a representative agent, in equilibrium the agent's 
consumption must of course equal aggregate consumption, but the two for 
mulations yield different Euler equations. In both equations the parameter 
k governs the degree of time-nonseparability. 
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In the internalbhabit specification, the derivation of the Euler equation 
is complicated by the fact that tunc-t consumption affects the summation in 
(8.4.4) through the term dated 4 + fas well as the term dated J. We have 


UA = [E uua "Ua NT GG EX" GA (vp 


This is random at time T because it depends on consumption at time (+1, 
Substituting in for X, and imposing the condition that he agents own con- 
sumption equals aggregate consumption, this becomes 

aprige КЮ op tae Dey, Я 

1% id, = GIU GT Sae er rey, (8.4.8) 
If this model is to capture the idea of habit formation, then we need 
K(y = 1) > 010 ensure that an increase in yesterday's consumption in- 


creases the marginal utility of consumption today. The Euler equation can 
now be written as 


PIU] = SEJA + Rað Uraaa, (8.4.0) 


where the expectations Operator on the left-hand side is necessary because 
of the randomness of 00/0. 

The analysis simplifies considerably in the external-habit specification, 
In this case (84.8) and (8.1.0) can be combined to give 


E aO un ay ear Gy, (8.4.10) 


If we assume homoskedasticity and joint lognormality of asset returns aud 
consumption growth, this implies the following restrictions on risk premia 
and the risbless real interest rate: 


qaa = log — утау + y E M Aen] — k(y — DAG, (8.4.11) 


lit peril +0272 = yo,. (8.4.12) 


Equation (8.4.11) says that the riskless real interest rate equals its value 
under power utility, less k(y = DAG. Holding consumption today and ex- 
pected consumption tomorrow constant, an increase in consumption ves- 
terday increases the marginal utility of consumption today. This makes the 
representative agent want to borrow from the future, driving up the real 
interest rate, Equation (8.4.19) describing the risk premium is exactly the 
same as (8.2.7), the risk premium formula for the power utility model. The 
external habit simply adds a term to the Euler equation (8.4.10) which is 
known at time t. and this does not affect the risk premium. 

Abel (1090, 1996) nevertheless argues that catching up with the Joneses 
can help to explain the equity premium puzzle. This argument is based on 
Iwo considerations, First, the average level of the riskless rate in (8.4.1312 is 
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— logó—y?o2 /9--(y —(y —1))g, where gis the average consumption growth 
rate. When risk aversion y is very large, a positive « reduces the average 
riskless rate. Thus catching up with the Joneses enables one to increase 
risk aversion to solve the equity premium puzzle without encountering the 
riskfrce rate puzzle. Second, a positive x is likely to make the riskless real 
interest rate more variable because of the term —x(y —1)A« in (8.4.11). 
If one solves for the stock returns implied by the assumption that stock 
dividends equal consumption, a more variable real interest rate increases 
the covariance of stock returns and consumption oi and drives up the equity 
premium. 

The second of these points can be regarded as a weakness rather than 
a strength of the model. The equity premium puzzle shown in Table 8.1 is 
that the ratio of the measured equity premium to the measured covariance 
di, is large; increasing the value с implied by a model that equates stock 
dividends with consumption does not improve matters. Also the real interest 
rate does not vary greatly ín the short run; the standard deviation of the ex 
post real commercial paper return in Table 8.1 is 5.5%, and Table 8.2 shows 
that about a third of the variance of this return is forecastable, implying a 
standard deviation for the expected real interest rate of only 396. Since the 
standard deviation of consumption growth is also about 396, large values of 
к and y in equation (8.4.11) tend to produce counterfactual volatility in 
the expected real interest rate. Similar problems arise in the Antena nap 
model. 

This difficulty with the riskless real interest rate is a fundamental prob- 
lem for habit-formation models. Time-nonseparable preferences make 
marginal utility volatile even when consumption is smooth, because con- 
sumers derive utility from consumption relative to its recent history rather 
than from the absolute level of consumption. But unless the consumption 
and habit processes take particular forms, time-nonseparability also creates 
large swings in expected marginal utility at successive dates, and this implies 
large movements in the real interest rate. We now present an PURGE Js 
specification in which it is possible to solve this problem. 


Difference Models 


Consider a model in which the utility function is 


gs (Cj – m "- 


| ; (8.4.13) 


j79 


and for simplicity treat the habit level X, as external. This model differs 
from ihe ratio model in two important ways. First, in the difference model 
the agents risk aversion varies with the level of consumption relative to 
habit, whereas risk aversion is constant in the ratio model. Second, in the 
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difference model consumption must always be above habit for utility to be 
well-defined, whereas this is not required in the ratio model. 

To understand the first point, it is convenient to work with the surplus 
consumption ratio &, defined by 


d C, — X. 
8 . (8.4.14) 
Cı үз 

The surplus consumption ratio gives the fraction of total consumption that 
is surplus to subsistence or habit requirements. If habit X, is held fixed 
as consumption C, varies, the normalized curvature of the utility function, 
which would equal the coefficient of relative risk aversion and would be a 
constant y in the conventional power utility model, is 


= Сисе _ ۲ 


ис Sı 


(8.4.15) 


This measure of risk aversion rises as the the surplus consumption ratio 5, 
declines, that is, as consumption declines toward habit!” 

The requirement that consumption always be above habit is satisfied 
automatically in microeconomic models with exogenous asset returns and 
endogenous consumption, as in Constantinides (1990) and Sundaresan 
(1989). It presents a more serious problem in models with exogenous 
consumption processes. To handle this problem Campbell and Cochrane 
(1995) specify a nonlinear process by which habit adjusts to consumption, 
remaining below consumption at all times, Campbell and Cochrane write 
down a process for the log surplus consumption ratio s = 1ор(5). They 
assume that log consumption follows a random walk with drift g and inno- 

ation v1, Acai = g + um. They propose an AR(1) model for s; 


| Sat = (I-) +65 + A (HY оі. (8.4.16) 


Here 315 the steady-state surplus consumption ratio. The parameter ф gov- 

erns the persistence of the log surplus consumption ratio, while the sensi- 

ivity function A(s,) controls the sensitivity of s1 and thus of log habit xii 

O innovations in consumption growth vpi. 

| Equation (8.4.16) specifies that today's habit is а complex nonlinear 

unction of current and past consumption. By taking a linear approxima- 

ion around the steady state, however, it may be showir that (8.4.16) is ap- 


V Risk aversion may also be measured by the normalized curvature of the value function 
maximized utility expressed as a function of wealth), or by the volatility of the stochastic 
discount factor, or by the maximum Sharpe ratio available in asset markets. While these 
measures of risk aversion are different from each other in this model, they all move inversely 


ith S. Nove that y, the curvature parameter in utility, is no longer a measure of risk aversion 
ip this model, 
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proximately a traditional habit-formation model in which log habit responds 
slowly to log consumption, 


Xa 8 [phe g] toy i ede 


[4 CS | 


1=0 


where A = lu — S) is the steady state value of x. The problem with the 
traditional model (8.4.17) is that it allows consumption to fall below habit, 
resulting in infinite or negative marginal utility. A process for s defined 
over the real line implies that consumption can never fall below habit. 
Since habit is external, the marginal utility of consumption is (Cj) = 


(C — Xy * = SGT. The stochastic discount factor is then 
1 (C Su G X 
My, = 8 н) = 0 на at) Я (8.4.18) 
w (C) 5$ G 
In the standard power utility model $, = 1, so the stochastic discount factor 


is just consumption growth raised to the power —y. ‘To get a volatile stochas- 
tic discount factor one needs a large value of y. In the habitforination 
model one can instead geta volatile stochastic discount factor from a volatile 
surplus consumption ratio $. 

"The riskless real interest rate is related to the stochastic discount factor 
by (1 + 15 = / (Mm). Taking logs, and using (8.4.16) and (8.4.18), 
the log riskless real interest rate is 

2572 


La = 1р8) + yg =- — 3) = zs (AG) 2. (8.4.19) 


The first two terms on the righthand side of (8.4.19) arc familiar from the 
power utility model (8.2.6), while the last two terms are new. The third term 
(linear in (s = 3)) reflects intertemporal substitution, or mean-reversion 
in marginal utility. If the surplus consumption ratio is low, the marginal 
utility of consumption is high. However, the surplus consumption ratio is 
expected to revert to its mean, so marginal utility is expected to fall in the 
future. Therefore, the consumer would like to borrow and this drives up the 
equilibrium risk free interest rate. The fourth term (linear in [A(s)4-1 12) 
reflects precautionary savings. As uncertainty increases, consumers become 
more willing to save and this drives down the equilibrium riskless interest 
rate. 

IE this model is to generate stable real interest vates like those observed 
in the data, the serial correlation parameter @ must be near onc. Also, the 
sensitivity function A(s) must decline with s( 50 that uncertainty is high when 
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is low and the precautionary saving term offsets the intertemporal substitu- 
tion term. In fact, Campbell and Cochrane parametrize the A(s,) function 
80 that these two terms exactly offset each other everywhere, implying a 
Constant riskless interest rate, 

Even with a constant riskless interest rate and random-walk consump- 
tion, the external-habit model can produce a large equity premium, volatile 
stock prices, aud predictable excess stock returns. The basic mechanism is 
lime-variation in risk aversion. When consumption falls relative to habit, the 
resulting increase in risk aversion drives up the risk premium on risky assets 
such as stocks, ‘This also drives down the prices of stocks, helping to explain 
why stock returns are so much more volatile than consumption growth or 
riskless real interest rates, 

Campbell and Cochrane (1995) calibrate their model to US data on con- 
sumption and dividends, solving for equilibrium stock prices in the tradition 
of Mehra and Prescott (1985). ‘There is also some work on habit formation 
that uses actual stock return data in the tradition of Hansen and Single- 
ton (1982, 1983). Heaton (1995), for example, estimates an internal-habit 
model allowing for Lime-aggregation of the data and for some durability of 
those goods formally described as nondurable in the national income ac- 
counts, Durability can be thought of as the opposite of habit formation, in 
that consumption expenditure today lowers the marginal utility of consump- 
tion expenditure tomorrow, Heaton finds that durability predominates at 
high frequencies, and habit formation at lower frequencies. However his 
habit-formation model, like the simple power utility model, is rejected sta- 
tistically. 

Both these approaches assume that aggregate consumption is the driv- 
ing process for marginal utility, An alternative view is that, for reasons dis- 
cussed in Section 8.3.2, the Consumption of stock market investors may not 
be adequately proxied by macroeconomic data on aggregate consumption, 
Under this view the driving process for a habit-formation model should be 
à process with à reasonable mean and standard deviation, but need not be 
highly correlated with БЕТ: Че consumption, 


8.4.2 Prychological Models of Preferences 


Psychologists and experimental economists have found that in experimen- 
tal settings, people make choices that differ in several respects from the 
Standard model of expected utility. In response to these findings unortho- 
dox "psychological" models of preferences have been suggested, and some 
recent research has begun to apply these models to asset pricing, !® 


Useful Beneral references include Пора and Reder (1987) and Kreps (1988), 
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j=l 


(8.4.20) 


This specification has three main components: the period utility function 
U(C,), the geometric discounting with discount factor 6, and the mathemat- 
ical expectations operator E,. Psychological models alter one or more of 
these components. 

The best-known psychological model of decision-making is probably the 
prospect theory of Kahneman and Tversky (1979) and Tversky and Kahneman 
(1992). Prospect theory was originally formulated in a static context, so it 
does not emphasize discounting, but it does alter the other two elements of 
the standard framework. Instead of defining preferences over consumption, 
preferences are defined over gains and losses relative to some benchmark 
outcome. A key feature of the theory is that losses are given greater weight 
than gains. .Thus if x is a random variable that is positive for gains and 
negative for losses, utility might depend on 


triat e 
—— 1 > 0 
Th; fx > 


v(x) = : 
N Их < 0. 


(8.4.21) 


Here y, and ya are curvature parameters for gains and losses, which may 
differ from one another, and A > 1 measures the extent of loss aversion, the 
greater weight given to losses than gains. 

Prospect theory also changes the mathematical expectations operator 
in (8.4.20). The expectations operator weights each possible outcome by its 
probability; prospect theory allows outcomes to be weighted by nonlinear 
functions of their probabilities (see Kahneman and Tversky (1979)) or by 
nonlinear functions of the probabilities ofa better or worse outcome. Other, 
more general models of investor psychology also replace the mathematical 
expectations operator with a model of subjective expectations. See for exi 
ample Barberis, Shleifer, and Vishny (1996) DeLong, Shleifer, Summers, 
and Waldmann (1990b), and Froot (1989). | 

In applying prospect theory to asset pricing, a key question is how the 
benchmark outcome defining gains and losses evolves over tíme. Benartzi 
and Thaler (1995) assume that investors have preferences defined over 
returns, where a zero return marks the boundary between a gain and a. 
loss. Returns may be measured over different horizons; a K-month return 
is relevant if investors update their benchmark outcomes every K months. 
Benartzi and Thaler consider values of K ranging from one to 18. They show 
that loss aversion combined with a short horizon can rationalize investors’ 
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unwillingness to hold stocks even in the face of a large equity premium. 
Bonomo and Garcia (1993) obtain similar results in a consumption-based 
model with loss aversion. 

In related work, Epstein and Zin (1990) have developed a parametric 
version of the choice theory of Yaari (1987). Their specification for period 
utility displays first-order risk aversion—the risk premium required to induce 
an investor to take a small gamble is proportional to the standard deviation 
of the gamble rather than the variance as in standard theory. This feature 
increases the risk premia predicted by the model, but in a calibration ex- 
ercise in the style of Mehra and Prescott (1985), Epstein and Zin find that * 
they can fit only about onc third of the historical equity premium. 

Another strand of the literature alters the specification of discounting 
in (8.4.20). Ainslie (1992) and Loewenstein and Prelec (1992) have argued 
that experimental evidence suggests not geometric discounting but hyperbolic 
discounting: The discount factor for horizon K is not 8^ but a function of 
the (orm (14-8; K)75/5, where both бү and бу are positive as in the standard 
theory. This functional form implies that a lower discount rate is used 
for periods further in the future. Laibson (1996) argues that hyperbolic 
jiscounting is well approximated by a utility specification 


U(C) + BE, LVU. (8.4.22) 


j=l 


where the additional parameter В < I implies greater discounting over 
the next period than between any periods further in the future. 
Hyperbolic discounting leads to time-inconsistent choices: Because the 
discount rate between any two dates shifts as the dates draw nearer, the 
ptimal plan for those dates changes over time even if no new information 
arrives, The implications for consumption and portfolio choice depend 
n the way in which this time-consistency problem is resolved. Laibson 
(1996) derives the Euler equations for consumption choice assuming that 
the individual chooses each period's consumption in that period without 
cing able to constrain future consumption choices. Interestingly, he shows 
that with hyperbolic discounting the elasticity of intertemporal substitution 
less than the reciprocal of the coefficient of relative risk aversion even 
when the period utility function has the power form. 


+ 
+ 


8.5 Conclusion 


Financial economists have not yet produced a generally accepted model 
of the stochastic discount factor. Nonetheless substantial progress has been 
made. We know that the stochastic discount factor must be extremely volatile 
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if itis to explain the cross-sectional pattern of asset returns, We also know 
that the conditional expectation of the stochastic discount factor must be 
comparatively stable in order 10 explain the stability of the riskless real 
interest rate. These properties put severe restrictions on the kinds of asset 
pricing models that can be considered. 

There is increasing interest in the idea that risk aversion may vary over 
lime with the state of the economy. Timc-varying risk aversion can explain 
the large body of evidence that excess returns on stocks and other risky 
assets are predictable. One mechanism that can produce time-varying risk 
aversion is habit formation in the utility function of a representative agent. 
But it is also possible that investors appear to have time-varying risk aversion 
because they trade on the basis of irrational expectations, or that timc- 
varying risk aversion arises from the interactions of heterogencous agents. 
Grossman and Zhou (1996), for example, present a model in which two 
agents with different risk-aversion coefficients trade with each other. One 
of the agents has an exogenous lower bound on wealth, and the resulting 
equilibrium has a time-varying price of risk. "This is likely to be an active 
area for future research. 


Problems—Chapter 8 


8.1 Prove that the benchmark portfolio has the properties. (РІ) 
through (P5) stated on pages 208 and 300 of this chapter. 


8.2 Consider an economy with a representative agent who has power 
utility with coefficient of relative risk aversion у. The agent receives 
а nonstorable endowment. The process for the log endowment, or 
equivalently the log of consumption c, is 


Ang = quién uu 


where the coelficient $ may be either positive or negative. IC is positive 
then endowment fluctuations are highly persistent; if it is negative then 
they have an important transitory component. 
8.2.1 Assume that consumption and asset returns are joiutly log- 
normal, with constant variances and covariances. 
i. Use the representative agents Euler equations to show that 
the expected log return ou апу asset is a lincar function of the 
expected growth rate of the endowment, What is the slope coet 
ficient in this relationship? 


п. Use the representative agents Euler equations to show that 
the difference between the expected log return on any asset and 
the log riskfree interest rate, plus one-half the own variance of 
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the log asset return (call this sum the "premium" on the asset), is 
proportional to the conditional covariance of the log asset return 
with consumption growth, What is the slope coefficient in this 
relationship? 


8.2.2 "lo a close approximation, the unexpected return on any 
asset i Can be written as 


^o oo 
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where d,, is the dividend paid on asset i at time t. This approxima- 
tion was developed as (7.1.25) in Chapter 7. 
i. Use this expression to calculate the unexpected return on an 
equity which pays aggregate consumption as its dividend. 


ti. Use this expression to calculate the unexpected return on a 
real consol bond which has a fixed real dividend cach period. 


8.2.3 
i. Calculate the equity premium and the consol bond premium. 


ii. Slow that the bond premium has the opposite sign to $ and is 
proportional to the square of y. Give an economic interpretation 
of this result. 


iii. Show that the equity premium is always larger than the bond 
premium, and the difference between them is proportional to y. 
Give an economic interpretation of this result, 


iv Relate vour discussion to the empirical literature on the "eq- 
uitv premium puzzle." 


8.3 Consider а two-period world with a continuum of consumers, 
lach consumer has û random endowment in the second period aud 
consumes only in the second period. In the first period, securities are 
aded but io money changes hands until the second period. All con- 
sumers have Jog utility over second- period consumption. 
8.3.1 Suppose that all consumers endowments are the same. They 
are m with probability 1/2 and (-m with probability 1/2, where 
0 < a < IL Suppose that a claim to the second-period aggregate 
endowment is traded and that it costs pin eitherstate, payable in the 
second period. Compute the equilibrium price p aud the expected 
return on the chiin. 


8.3.2 Now suppose that in the second period, with probability‘. + 
all consumers receive m; with probability 1/2, a fraction (1—5) of & 

: : А lis 
consumers receive m and a fraction Û receive (1—a/b)m. In the!” 
first period, all consumers face the same probability of being in he 
latter group, but no insurance markets exist through which they, 
can hedge this risk. Compute the expected return on the claim: 


defined above. Is it higher or lower than before? Is it bounded 0 
a function of a and m? 


8.3.5 Relate your answer to the recent empirical literature on the, 
determination of stock returns in representative-agent models. To 
what extent do your results in parts 8.3.1 and 8.3.2 depend on the 


details of the model, and to what extent might they hold more 
generally? ; 


Note: This problem is based on Mankiw (1986). 
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Derivative Pricing Models 


THE PRICING OF OPTIONS, warrants, and other derivative securitics—financial 
securities whose payoffs depend on the prices of other securities is one of 
the great successes of modern financial economics. Based on the well-known 
Law of One Price or no-arbitrage condition, the option pricing models of 
Black and Scholes (1973) and Merton (1973b) gained an almost immediate 
acceptance among academics and investment professionals that is unparal- 
leled in the history of economic science.! 

The fundamental insight of the Black-Scholes and Merton models is 
that under certain conditions an option's payoff can be exactly replicated 
by a particular dynamic investment strategy involving only the underlying 
stock and riskless debt. This particular strategy may be constructed to be 
self-financing, i. e., requiring no cash infusions except atthe start and allowing 
no cash withdrawals until the option expires; sincc the strategy replicates 
the option's payoff at expiration, the initial cost of this self-financing invest- 
ment strategy must be identical to the option's price, otherwise an arbitrage 
opportunity will arise. This no-arbitrage condition yields not only the op- 
tion's price but also the means to replicate the option synthetically—via the 
dynamic investment strategy of stocks and riskless debi—if it docs not exist. 

This method of pricing options has since been used to price literally 
hundreds of other types of derivative securities, some considerably more 
complex than a simple option. In the majority of these cases, the pricing 
formula can only be expressed implicitly as the solution of a parabolic par- 
tial differential equation (PDE) with boundary conditions and initial values 
determined by the contractual terms of cach security. To obtain actual 
prices, the PDE must be solved numerically, which might have been prob- 
lematic in 1973 when Black and Scholes and Merton first published their 
papers but is now commonplace thanks to the breakthroughs in computer 


Sec Bernstein (1992, Chapter H) for a lively account of the intellectual history of the 
Black-Scholes/Merton option pricing formula, 
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technology over the past three decades. Although a detailed discussion of 
derivative pricing models is beyond the scope of this text—there are many 
other excellent sources such as Cox and Rubinstein (1985), Hull (1993), 
and Merton (1000) —we do provide brief reviews o£ Brownian motion in 
Section 9.1 and Merton's derivation of the Black-Scholes formula in Section 
9.2 for convenience. 

Ironically, although pricing derivative securities is often highly computa- 
Hion-intensive, in principle it leaves very little room for traditional statistical 
ference since, by the very nature of the noswbitrage pricing paradigm, 
there exists no “error term" to be minimized and no corresponding statisti- 
cal fluctuations to contend with. After all, i£ an option's price is determined 
exactl- without crror—as some (possibly time-varying) combination of 
prices of other traded assets, where is the need for statistical inference? 
Methods such as regression analysis do not seem to play a role even in the 
application of option pricing models to data. 

However, there are atleast two aspects of the implementation of deriva- 
five pricing models that do involve statistical inference and we shall focus 
on them in this chapter, The first aspect is the problem of estimating the pa- 
rameters of continuous-time price processes which are inputs for parametric 
derivative pricing lormulas; We use the qualifier parametric in describing the 
derivative pricing formulas considered in this chapter because several von- 
parametric approaches to pricing derivatives have recently been proposed 
(sce, for example, Ait-Sahalia [1092], Ait-Sahalia and Lo [1995], Hutchin- 
son, Lo, and Poggio [1004], and Rubinstein [1994]).*. We shall consider 
nonparametric derivative pricing models in Chapter 12, and focus on issues 
surrounding parametric models in Section 9.3. 

The second aspect involves the pricing of path-dependent derivatives 
by Monte Carlo simulation. A derivative security is said to be Е 
Fits payoff depends in some way on the entire path of the underlying asset's 
price during the derivative's life. For example, a put option which gives 
the holder the right to sell the underlying asset at its average price—where 
the average is calculated over the life of the option—is path-dependent 
because the average price isa function ol the underlying asset's entire price 
path, Although a few analytical approximations for pricing path-dependent 
derivatives do exist, by far the most effective method for pricing them is 
by Monte Carlo simulation. This raises several issues such as measuring 
the accuracy of simulated prices, determining the number of simulations 
required for a desired level of accuracy, and designing the simulations to 


Th contas to the hadi panancinc approach in which the price process of the 
ander hing asset is tally specified upto a fine imber of unknown ранее, eg, a lognor- 
mad diffusion with unknown diit and voll радних, a uonparaaen ie approu h docs 


nol specify the price process explicitiv and attempts to infer i from the data under suivable 
керий conditions, 


9.1. Brownian Motion 


make the most economical use of computational resources, We turn to 
these issues in Section 9.4, 


9.1 Brownian Motion 


For certain financial applications it is often more convenient to model prices 
as evolving continuously through time rather than discretely at fixed dates, 
For example, Merton's derivation of the Black-Scholes option-pricing for- 
mula relies heavily on the ability to adjust portfolio holdings continuously in 
time so as to replicate an option's Payoff exactly. Such a replicating portfo 
lio strategy is often impossible to construct in a discrete-time setting; hence 
pricing formulas for options and other derivative securities are almost always 
derived in continuous time (see Section 9.2 fora more detailed discussion). 


9.1.1 Constructing Brownian Motion 


The first formal mathematical model of financial asset prices—developed һу 
Bachelier (1900) for the larger purpose of pricing warrants trading on thei 
Paris Bourse—was the continuous-time random walk, or Brownian motion. 
Therefore, it should not be surprising that Brownian motion plays such a 
central role in modern derivative pricing models. This continuous-time: 
process is closely related to the discrete-time versions of the random walk' 
described in Section 2.) of Chapter 2, and we shall take the discrete-time 
random walk asa starting pointin our heuristic construction of the Brownian , 
motion? Our goal is to use the discrete-time random walk to define at 
sequence of continuous-time processes which will converge to a continuous- 
time analog of the random walk in the limit, 


The Discrete-Time Random Walk . 


Denote hy (fy) a discrete-time random walk with increments that take on 
only two values, ^ and : 


A with probability x 
Pe = [л + ey е = а SMS (9.1.1) 
~A with probability x’ = 1 — л, 


where e, is IID (hence p, follows the Random Walk I model of Section 2.1.1 of 
Chapter 2), and Ро is fixed. Consider the following continuous-time process 
P, t € (0, T], which can be constructed from the discrete-time process 


“There are two notable exceptions: the equilibrium approach of Rubinstein (1976), and 
the binomial option-pricing model of Cox, Ross, and Rubinstein (1979), 
For à more rigorous derivation, see Billingsley (1968, Section 37). 
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н 
[fu], k= 1,..., nas follows: Partition the time interval [0, 7] into n pieces, 
each of length A = Т/п, and define the process 


Pall) = рим = фит t € [0, T], (9.1.2) 


where [x] denotes the greatest integer less than or equal to x. pa(t) is a 
rather simple continuous-time process, constant except at times £ = kh, 
K = I.. . .. n (see Figure 9.1). ` 

Although 5,(1) is indeed a well-defined continuous-time process, it is 
sull essentially the same as the discrete-time process n since it only varies at 
discrete points in time. However, if we allow n to increase without bound 
while holding T fixed (hence forcing A to approach 0), then the number 
of points in (0, 7] at which t) varies will also increase without bound, 
If, at the same time, we adjust the step size A and the probabilities zt and 
z' appropriately as n increases, then e) converges in distribution to a 
well-defined nondegenerate continuous-time process p(t) with continuous 
sample paths on [0, 7]. ` 

To sec why adjustments to A, r, and x’ are necessary, consider the mean 
and variance of): 


li 


EL CT)) „r n) M 09133) 
Varl ,)] = Ann A A. (9.1.4) 


As n increases without bound, the mean and variance of h,) will also in- 
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crease without bound if A, r, and a! are held fixed. To obtain a well-defined 
and nondegenerate limiting process, we must maintain finite moments for 
PaT) as n approaches infinity. In particular, since we wish to obtain a 
continuous-time version of the random walk, we should expect the mean 
and variance of the limiting process PCF) to be linear in T, as in (2.1.5) and 
(2.1.6) for the discrete-time random walk. Therefore, we must adjust A, л, 
and zr’ so that: 


T 
* — 0A = 7 (r -rA > nT (9.1.5) 
1 
Є. T N 2% . 
damm At = irr А = aT, " (9.1.6) 
1 
and this is accomplished by setting: 
] nh 
л = 23117 . 
2 0 
1 1 
ГИГ (9.1.7) 
2 0 


The adjustments (9.1.7) imply that the step size decreases as n increases, 
but at the slower rate of / Vn or Vh. The probabilities converge to 5, also 
at the rate of V/A, and hence we write: 


л = 5+ OVI), л/ = 5+ O(Vh). A = O(Vh). (9.1.8) 


where O(Vh) denotes terms that are of the same asymptotic order as Vh? 

Therefore, as n increases, the random walk 5) varies more frequently 

on (0, T], but with smaller step size A and with up-step and down-step 
1 


probabilities approaching 5. 


^A function JO is said to be of the same asymptotic order as gIn-—denwoted by fü ~ 
(Ag —if the following relation is satisfied: 


h) 
0 < lim [о < 
hoot (Л) 


A function 7 issaid to be of smaller asymptotic order as gUn—denoted by / Un ~ e(g n) — il 


h 
lim ИМ zo. 


hoa gih) 


Some exiinples of asymptotic order relations ate 


hx VN. 3È ~ wh. hl! ~ OSI в ~ (KM). 
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The Continuous Vine Limit 
By calculating the momentgenerating function of p, CP) aud taking its limit 
as a approaches infinity, it may be shown that the distribution of p(T) is nor- 
mal with mean yT and variance o* 7; thus py CD) converges in distribution 
to à AQUI, 67 random variable. 

ln басе û much stronger nation of convergence relates fo D) to pcr), 
which involves the frutedinensional distributions (FDDs) of the two stochas- 
tic processes. An FDD of a stochastic process plU) is the joint distribu- 
tion of any баце number of observations of the stochastic process, i. e., 
(%. peg... PLD where 0 < f « « < t, € T. Ii can be shown that 
all he FDDs of the stochastic process p,0) (not just the random variable 
[h EEN converge to the FDDs afa well-detined stochastic process 100." This 
implies, lor example, that the distribution of p,() converges to the distri- 
bution of pO) for all ce |0, 7] (not just at T), that the joint distribution of 
(рО), pad). р, converges to the joint distribution o£ %). £03). ph, 
and so on. 

Пао to the normality of p(T), the stochastic process ро) possesses 
the following three properties: 


(BI) For any 4 and fj such that 0 5 S Т“ 
pU) ә pin) oe Мн — t). a? (6 — w). (9.1.9) 


(B2) Vor any f, E. , and 4 such that 0 < & < & < A < ½ S T, 
the increment pt) pO) is statistically independent of the increment 
PLD pus). 


(BF) The харе рах ol pU) ace continuous, 


Iris a remarkable fact that pO), which is the celebrated arithmetic Brownian 
notion ov Wiener process, is completely characterized by these three properties, 
If we set p = Оа о = I, we obtain standard Brownian motion which we 
shall denote by BO). Accordingly, we may re-express p(t) as 


POO = yet + a BO), te [0. T]. (9.1.10) 


To develop further intuition for pU), consider the following conditional 
moments: 


кри) | pled] 


U 


pla) 4 i — l) (9.1.11) 
Var[ptt | pt]. = a^ (1 — №) (9.1.12) 
“Phe comerence of FEDES, conpled with a technical conditional called tiges is called 


1 convergence, a ромео in devising asvimptotic approximations lor the statistic al Laws 
ol many nonfincar estimators. See Billingsley (1908) for further details. 
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Figure 9.2. Sample Path and Conditional Expectation of a Brownian Motion with Drift 


Cov( p(t), p(t)] = Cov pl), pla) = pl) + PG]. (9.1.13) 
= Cov[p(h), p(t) — p(t)] 

+ Cov[ p(y), play] (9.1.14) 

= Var[p(t)] = f. (9.1.15) 


As for the discrete-time random walk, the conditional mean and variance 
of p(t) are linear in ¢ (see Figure 9.2). Properties (B2) and (ВЗ) of Brow- 
nian motion imply that its sample paths are very erratic and jagged—if 
they were smooth, the infinitesimal increment R(t - (Cr would be pre- 
dictable by B(1)— B(1—A), violating independence. In fact, observe that the 
ratio (BU+h)—BWD) / A does not converge to a well-defined random variable 
as A approaches 0, since 


-. 9.1.16 
h h ( ) 


Bit + h) — B(0 I 
Var | ا‎ = 
Therefore, the derivative of Brownian motion, HB (1), does not exist in the 
ordinary sense, and although the sample paths of Brownian motion are 
everywhere continuous, they are nowhere differentiable. 


) 
{ 
| 
| 
| 
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9.1.2 Stochastic Differential Equations 


Despite this fact, the infinitesimal increment of Brownian motion, i.c., the 
limit of BUt+A)—B(t) as l approaches an infinitesimal of time (dt), has 
carned.the notation dB(t) with its own unique interpretation because it has 
become a fundamental building block for constructing other continuous- 
time processes.” Heuristically, BOY) -) can be viewed as Gaussian white 
noise (see Chapter 2) and in the limit as л becomes infinitesimally small, 
dB(1) is the “continuous-time version" of white noise. 

It is understood that dB() is a special kind of differential, a stochastic 
‘differential, not to be confused with the differentials dx and dy of the calcu- 
‘lus. Nevertheless, dh does obey some of the mechanical relations that 
ordinary differentials satisfy. For example, (9.1.10) is often expressed in 
differential form as: 


ар) = pdt+odB(t). (9.1.17) 


However, (9.1.17) cannot be treated as an ordinary differential equation, 
and is called a stochastic differential equation to emphasize this fact. For ex- 
ample, the natural transformation dp(t)/dt = u + dB(t)/ dt does not make 
sense because dB(t)/dt is not a well-defined mathematical object (although 
dB(t) is, by definition). 

Indeed, thus far the symbols in (9.1.17) have little formal content be- 
yond their relation to (9.1.10)—one set of symbols has been defined to be 
equivalent to another. To give (9.1.17) independent meaning, we must de- 
velop a more complete understanding of the properties of the stochastic 
differential dB(t). For example, since dB is a random variable, what are its 
moments? How do (dB)? and (dB)(dt) behave? To answer these questions, 
consider the definition of d B(1): 


d = jim Bt + h) - Bt) (9.1.18) 
hdt 
and recall from (B1) that increments of Brownian motion are nornially dis- 


tributed with zero mean (since и = 0) and variance equal to the differencing 
interval А (since с = 1). Therefore, we have 


IAB]! = lin, ELBU А) — BO) = 0 ‚ (9.1.19) 
Var[dB] = m ELBU А) — BUY] = dt (9.1.20) 
El(4B)(dB)] = lim ELBU А) — BË ] = dt (9.1.21) 


h- di 


7A complete and rigorous exposition of Brownian motion and stochastic differential equa- 
pons is beyond the scope of this text. Hoel, Port, and Stone (1972, Chapters 4-6), Merton 
01990, Chapter 3), and Schuss (1980) provide excellent coverage of this material, 
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Var| (05) ( 4) 


li 


lim f ELBU + В) — Bo! | - 7 


h dt 
= od) (9.1.22) 
E[(4B)(d)] = lim E[(QB(t + 4) — В) = 0 (9.1.23) 


л» 


Var[(28)(4)] = jim E(B + л) = BOY] = edt). (9.1.24) 
h— dt 


From (ВІ) апа (9.1.19)-(9.1.20) and we sce that dB) may be viewed as 
a normally distributed random variable with zero mean and infinitesimal 
variance dt. Although à variance of dt may seem like no variance at all, 
recall that we are in a world of infinitesimals—after all, according to (9.1.17) 
the expected value of dp(£) is p dt—so a variance of dt is not negligible in a 
relative sensc. 

However, a variance of (dt)? is negligible in a relative sense—relative to 
dt—since the square of an infinitesimal is much smaller than the infinitesi- 
mal itself. If we treat terms of order o(df) as essentially zero, then (9.1.21)- 
(3.1.24) shows that (dB)? and (dB)(dt) are both non-stochastic (since the 
variances of the right-hand sides are of order o(dt)) hence the relations 
(dB)? = dt and (dB)(dt) = 0 arc satisfied not just in expectation but exactly. 
This yields the well-known multiplication rules for stochastic differentials 


Table 9.1. Multiplication rules for stochastic differentials. 


can now calculate (dp)?: 


Up? = (udttadüy i (9.1.25) 
= и) +o? (dB) + 2po (d (dt) 
c? dt. (9.1.26) 


This simple calculation shows that although: dp is a random variable, (ар)? 
is not, It also shows that dp does behave like a random walk increment in 
that the variance of dp is proportional to the differencing interval dt. 


Geometric Brownian Motion 

If the arithmetic Brownian motion pCO is taken to be the price of some 
asset, Property (B1) shows that price changes over any interval will be nor- 
mally distributed, But since the support of the normal distribution is the 
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entire real line, normally distributed price changes imply that prices сап 
be negative with positive probability. Because virtually all financial assets 
enjoy limited liability the maximum loss is capped at – 1009, of the total 
investment—negative prices are empirically implausible. 

As in Sections 1.4.2 of Chapter Land 2.1.1 of Chapter 2, we may elimi- 
iate this problem by defining PU) to be the natural logarithm of price PQ). 
Under this definition, pC) can be an arithmetic Brownian motion without 
violating limited liability, since the price РО) = e? is always non-negative. 
The price process PU) = 2" is said to be a geometric Brownian motion or lognor- 
mal diffusion, We shall examine the statistical properties of both arithmetic 
and geometric Brownian motion in considerably more detail in Section 9.3. 


Ho's Lemma 


Although the first complete mathematical theory of Brownian motion is due 
to Wiener (1093) & it is the seminal contribution of Hû (1951) that is largely 
responsible for the enormous numberofapplications of Brownian motion to 
problems in mathematics, statistics, physics, chemistry, biology, engineering, 
and of course, financial economics. In particular, НО constructs a broad 
class of continuous-time stochastic processes based on Brownian motion— 
now known as [6 processes or Hà stochastic differential equations—which is closed 
under general nonlinear transformations; that is, an arbitrary nonlinear 
function /. D of an по process p and time tis itself an Hò process. 

More importantly, Hû (1951) provides a formula ti Lemma—for cal- 
culating explicitly the stochastic differential equation that governs the dy- 
namics of (p, t): 


| af af „де PER 
/. Y m mdp dit (dp). 9.1.27 
арр. 9 pt ap 5 aj: CP ( ) 


The modest term "lemma? hardly does justice to the wide-ranging impact 
(9.1.27) has had; this powerful tool allows us to quantify the evolution of 
complex stochastic systems in a single step. For example, let p denote the 
log-price process of an asset and suppose it satisfies (9.1.17); what are the 
dynamics of the price process PO) = e? T's Lemma provides us with an 
immediate answer: 


pct. окы og 
di = — dp + — Gy (9. i 28) 
1 ` df 


"See Jerison, Singer, and Stiroock. (1996) loi an excellent. historical retrospective ol 
Wiener 'N research which includes severa articles about Wiener'sintluence on modern financial 
economics. 
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el dp + 2 eb (dp? 
P(udt + o dB) + 5 P(a* qt) | 
dP = (ut$o*)Pdi+oPaB. (9.1.29) 


if 


In contrast to arithmetic Brownian motion (9.1.17), we see from (9.1.29) . 
that the instantaneous mean and standard deviation of the geometric Brow- ' 
nian motion are proportional to P. Alternatively (9.1.29) implies that the ; 
instantaneous percentage price change dP/P behaves like an arithmetic Brow- 
nian motion or random walk, which of course is precisely the case given the 
exponential transformation. 

We provide a considerably less trivial example of the power of Itó's 
Lemma in Section 9.2: Merton's derivation of the Black-Scholes formula. | 


9.2 A Brief Review of Derivative Pricing Methods | 


1 
Although we assume that readers are already familiar with the theoretical | 
aspects of pricing options and other derivative securities, we shall provide a : 
very brief review here, primarily to develop terminology and define notation. ; 
Our derivation is deliberately terse and we urge readers unfamiliar with ` 
these models to spend some time with Merton's (1990, Chapter 8) definitive $ 
treatment of the subject. Also, for expositional economy we shall confine 
our attention to plain vanilla options in this chapter, i.e., simple call and put 
options with no special features, and the underlying asset is assumed to be 
common stock.“ 

Denote by G(P(t), t) the price at time ¢ of a European call option with 
strike price X and expiration date Т > t on a stock with price P(t) at time 
119 Of course, G also depends on other quantities such as the maturity date 
T, the strike price X, and other parameters but we shall usually suppress 
these arguments except when we wish to focus on them specifically. 

However, expressing G as a function of the current stock price P(t), and 
nat of past prices, is an important restriction that greatly simplifies the task 
of finding G (in Section 9.4 we shall consider options that do not satisfy 


"However, the techniques reviewed in this section have been applied in similar fashion 
to literally hundreds of other types of derivative securities, hence they are considerably more 
general than they may appear to be. 

P Recall that a call option gives the holder the right io purchase the underlying asset for 
X aud a fut option gives the holder the right to sell the underlying asset for X. A European 
option is one that can be exercised only on the maturity date. An American option is one that 
can be exercised on or before the maturity date, For simplicity we shall deal only with European 
options in this chapter, See Cox and Rubinstein (1985), Hull (1993), and Merton (1990) for 
institutional details and differences between the pricing of American and European options, 
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this restriction). In addition, Black aud Scholes (1973) make the following 
assumptions: 


(Al) There are no markel imperfections, e.g., taxes, transactions costs, shortsales 
constraints, and trading is continuous and frictionless. 

(A2) There is unlimited riskless borrowing and lending at the continuously com- 
pounded rate of return r; hence a $1 investment in such an asset over the lime 
intervalt grows t0 $1 · . Alternatively, if D(t) is the date t price of a discount bond 
maturing at date T with face value $1, then for t € (0, T] the bond price dynamics 
are given by 

а) =  rD(() dt. (9.2.1) 


o the following HO stochastic differential equation on t € [0. 7]. 


Т ) The stack price dynamics are given by a geometric Brownian motion, the solution 
| dP(t) = uP(D dt +o Pit) dB), P(0 = P, > 0, (9.2.2) 
\ 
phere BC) is a standard Brownian motion, and at least one investor observes O 
‚ without error. 
! 


| 


(A4) There is по arbitrage. 


9.2.1 The Black-Scholes and Merton Approach 


The goal is to narrow down the possible expressions for G, with the hope of 
btaining a specific formula for it. Black and Scholes (1973) and Merton 
1973b) accomplish this by considering relations among the dynamics of 

зе option price С, the stock price P, and the riskless bond D. To do this, 

e first derive the dynamics of the option price by assuming that G is only a 

metion of the current stock price P and (itself and applying 075 Lemma 
see Section 9. 1.2 and Merton (1990, Chapter 3]) to the function 6000), 0, 

which yields the dynamics of the option price: 


\ 


a 


fi 


dG = uG dt +o, CG dB), (9.2.3) 
where 
1 ,0G ас Я 622 PG (0.2.4) 
Duct рис ape gro 3 AP 1 
1 n ас (9.9.5) 
On = — 14987 — J. 2.9 
К G ӘР 


Unfortunately, this expression does not scem.to provide any obvious restric- 
tions that might allow us to narrow down the choices for G. One possibility 
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is to set Cy equal to some "required? rate of iretur nz, one that comes from 
equilibrium considerations of the corresponding risk of the option, much 
like the CAPM (sce Chapter 5). Hsueh an r, can be identified, the condition 
Hg = n reduces to a PDE which, under some regularity aud boundary condi- 
tions, possesses a unique solution. This is the approach taken by Black and 
Scholes (1973).!! However, this approach requires more structure than we 
have imposed in (AI) (Ad) —a fully articulated dynamic model of economic 
equilibrium in which y, can be determined explicitly must be specified. 

Merton (19730) presents an alternative to the equilibrium approach 
of Black and Scholes (1973) in which the same option-pricing formula is 
obtained but without any additional assumptions beyond (A1)-(A4).. He 
does this by constructing a portfolio of stocks, options and riskless bonds 
that requires no initial net investment and no additional funds between 0 
and 7, asdf financing portfolio where long positions are completely financed 
by short positions. Specifically, denote by 4,0) the dollar amount invested in 
the stock at date 1, f/ % the dolar amount invested in riskless bonds at date 
f which mature at date 7, and 7,0) the dollar amount invested in the call 
option at date £. Then the zero net investment condition may be expressed 
as: 


DO + WO+ O = 0. vt 0. /). (0.2.6) 


Portfolios satisfying (9.2.0) awe called arbitrage portfolios. Merton (1909) 
shows that the instantancous dollar return d/ 10 this arbitrage portfolio is: 
1 1 1, 
= Pape! ang Eac 
dl = 7 dP + 5 D + E dG. (9.2.7) 
where the stochastic differentials dP and dD are given in (0.2.2) and (9.2.1) 
respectively, and dG follows from Itó's Lemma: 


dG =  n4Gdt 4 o4Gdli (9.2.8) 
LPUG/AP + 06/00 + S, at Gar 

My = | / = = / (9.2.9) 
0 Pa GaP | 

0. m I (9.2.10) 


Substituting the dynamics of POO, BG), and GG) into (0.2.7) and imposing 
(9.2.6) yields 


dl = qG Mp + ty Д + [o 1, + a, la] ABN. (0.2.11) 


Mn particula Black and Scholes (1973) assume that the CAPM holds, and obtain z by 
appeahng to the secinityanarketdine relation which links expected retains to beta. However, 
the CAPM is not a dymanie model of equilibrium returns and there iuc some subtle but мрн 
icant inconsistencies between the CAPM and continiois-tune option-pricing models (see, for 
example, Dybvig and Ingersoll [1982]). Nevertheless, Rubinstein (1976) prosides a dynamic 
equilibrium model in which the Black-Scholes formula holds. 


Newer 
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Now let us further restrict this ai bitrage portfolio to be completely riskless in 
the sense thal its reum is nonstochastic on [0, 7 J. This may be guaranteed 
by choosing i= l and % = р 50 that 


al; tah = 0. vi € 10. TI. (0.2.12) 


implying that 


wet) acu 
. (9.9.19) 
ТАШ ӘР 


for every l € 0. T1. where nU and ТАШ are the number of shares of 
the stock and whe option, respectively, held ib the self-financing, 4evro-risk 
portfolio. Note that unless (00% 0“ is constant through linie, both 1760 
and nd) must be time-varying 10 ensure that this portfolio is riskless at all 
tines. Such a portfolio is said to be perfectly hedged and асо РА known 
as the hedge ralio. 

Imposing (0.2.0) and (9.2.13) for aure 10, 71 yields a dynamic Porto 
Strategy that is Мех май requires no net investment, But then surely its 
nonstochastic тен id nsi be zero, otherwise there would exist a COSI CSS, 
riskless dynamic portfolio Strategy that yields a positive return. Therefore, 
to avoid arbitrage, it must be the case that 


yor ngu 4 Ur] — „ә ur = 0, (0.2.14) 


where a time argument has been added to pyt) to emphasize the fact that 
it varies through time as well. 

Surprisingly. this simple no-arbirage condition reduces the possible 
choices of G to just one expression which is a secoud-order linear parabolic 
PDE for C: 


) 5.2 ot mm 906 " 
UPS =a * سام‎ + TTT rG = 0. (9.2.15) 
2 a аР at 


subject. 10 the following IWO boundary conditions: („РТУ TY = 
махі PED- X. Gl. and COD = G. The unique solution js, of course, the 
Mack Scholes formula: 


GPU = J dels Ne HED qd) (0.2.16) 


log PU NY b nt 50 2 T0 
ааг (0.2.17) 
оу =! 


di 


log PUY) NY Û 7 50 T-N 
бызны ны eee (0.2.18) 


bs 
s aV T-I 


M 


9,2. A Brief Review of Derivative Pricing Methods 359. C EM 
8 n 


where Ф(:) is the standard normal cumulative distribution function, and 
Tt is the time-to-maturity of the option. 

The importance of assumptions (A1) and (A3) should now be appar- ` 
ent: itis the combination of Broypian motion (with its continuous sample 
paths) and the ability to trade continuously that enables us to construct 
a perfectly hedged portfolio. If either of these assumptions failed, there 
would be occasions when the return to the arbitrage portfolio is nonzero > 
and stochastic, i. e., risky. If so, the arbitrage argument no longer applies. 

Merton's derivation of the Black-Scholes formula also showcases the 
power of Itó's Lemma, which gives us the dynamics (9.2.8) of the option | 
price G that led to the PDE (9.2.15). In a discrete-time setting, obtaining 
the dynamics of a nonlinear function of even the simplest linear stochastic 
process is generally hopeless, and yet this is precisely what ís required to | 
constrict a perfectly hedged portfolio. 

More importantly, the existence of a self-financing perfectly hedged 
portfolio of options, stocks, and bonds implies that the option may be syn- 
thetically replicated by a self-financing dynamic trading strategy consisting | 
of only stocks and bonds. The initial cost of setting up this replicating port- ' 
folio of stocks and bonds must then equal the option’s price to rule out | 
arbitrage because the replicating portfolio is self-financing and duplicates 
the option’s payoff at maturity. The hedge ratio (9.2.13) provides the recipe 
for implementing the replicating strategy. 


Option Sensitivities 

The sensitivities of G to its other arguments also play crucial roles in trading 
and managing portfolios of options, so much so that the partial derivatives!” 
of G with respect to its arguments have been assigned specific Greek letters 
in the parlance of investment professionals and are now known collectively 
as option sensitivities or, less formally, as the "Greeks";!? i 


| i 


dG 
Delta A = ЭР (9.2.19) 
С; Г = б 9.2.20 
amma = J (9.2.20) 
Theta Ө = (9.2.21) 


The term “partial derivatives" in this context refers, of course, to instantaneous rates of 
change. This is unfortunate coincidence of terminology is usually not a source of confusion, 
but readers should beware. 


MOF course, "vega? is not a Greek letter and V is simply a script V. À 


— 
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. dG 
Rho R = — (9.2.22) 
| дт 
Veg: ve d (9.2.23) 
ega = Бета. 9.9.9: 


For the Black-Scholes formula (9.2.16), these option sensitivities can be 
evaluated explicitly: 


A = d(d) (9.2.24) 

ros UT (9.2.25) 

Ө = - di ——— (di) - Xre^ T- $(d;) (9.2.26) 
21 

R = (T—-0)Xe '7o(q,) (9.2.97) 

V = Р/Т-1ф(4,) (9.2.28) 


where G.) is the standard normal probability density function. We shall 
have occasion to consider these measures again once we have developed 
methods for estimating option prices in Section 9.3.3. 


9.2.2 The Martingale Approach 


Once Black and Scholes (1973) and Merton (1973b) presented their option- 
pricing models, it quickly became apparent that their approach could be 
used to price a variety of other securities whose payoffs depend on the prices 
of other securities: Find some dynamic, costless self-financing portfolio 
strategy that can replicate the payoff of the derivative, and impose the no- 
arbitrage condition, This condition usually reduces to a PDE like (9.2.15), 
subject to boundary conditions that are determined by the specific terms of 
the derivative security. 

It is an interesting fact that pricing derivative securities along these 
lines does not require any restrictions on agents’ preferences other than 
nonsatiation, i.c., agents prefer more to less, which rules out arbitrage op- 
portunities, Therefore, the pricing formula for any derivative security that 
can be priced in this fashion must be identical for all preferences that do 
not admit arbitrage. In particular, the pricing formula must be the same 
regardless of agents’ risk tolerances, so that an economy composed of risk- 
neutral investors must yield the same option price as an economy composed 
of risk-averse investors. But under risk-neutrality, all assets must carn the 
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same expected rate of return which, under assumption (A2), must equal 
the riskless rate r. This fundamental insight, due to Cox and Ross (1976), 
simplifies the computation of option-pricing formulas enormously because 
in a risk-neutral economy the option's price is simply the expected value of 
its payolf discounted at the riskless rate: 


ба) = e Е Махр) - X. 0]. (9.2.20) 


However, the conditional expectation in (9.2.20) must be evaluated with 
respect to the random variable P* (D), not PCT), where РУС) is the terminal 
stock price adjusted for risk-neutrality. 

Specifically, under assumption (A3), the conditional distribution of 
P(T) given P(t) is simply a lognormal distribution with Elog P(T) | PCO) = 
log PQ) + (a — SX T-t) and Магор PCT) | P()] = ca*(T—1). Under risk- 
neutrality, the cced rate of return for all assets must be r, and hence the 
conditional distribution of the risk-neutralized terminal stock price РСР) is 
асо lognormal, but with E[log РС) | POJ = log PQ) + (r — (Т-А) апа 
Var ilog POD) | P(Q)] et). 

For obvious reasons, this procedure is called the risk-neutral pricing 
method and under assumptions (AJ) through (A4), the expectation in 
(9.2.29) may be evaluated explicitly as a function of the standard normal 
CDF and yields the Black-Scholes formula (9.2.16). 

To emphasize the generality of the risk-neutral pricing method in valu- 
ing arbitrary payolf streams, (9.2.29) is often rewritten as 


ба) = e" tf [Max PCD - X. ol]. (9.2.30) 


where the asterisk in E? indicates that the expectation is to be taken with re- 
spect to an adjusted probability distribution, adjusted to be consistent with 
risk-heutrality. In a more formal setting, Harrison and Kreps (1979) have 
shown that the adjusted probability distribution is precisely the distribution 
under which the stock price follows a martingale; thus they call the adjusted 
distribution the equivalent martingale measure. Accordingly, the risk-neutral 
pricing method is also known as the martingale pricing technique. We shall ex- 
ploit this procedure extensively in Section 9.4 where we propose to evaluate 
expectations like (9.2.30) by Monte Carlo simulation. 


9.3 Implementing Parametric Option Pricing Models 


Because there are so many different types of options and other derivative 
securities, it is virtually impossible to describe a completely general method 
for implementing all derivative-pricing formulas. The particular features 
of each derivative security will often play a central role in how its pricing 
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formula is to be applied most effectively. But there are several common 
aspects to every implementation of a parametric option-priciug model—a 
model in which the price dynamics of the underlying security, called the 
fundamental asset, is specified up to a finite number of parameters—and we 
shall focus ou these common aspects in this section, 

To simplily terminology, unless otherwise stated we shall use the term 
option to mean any general derivative security, and the term stock to mean the 
derivative security's underlying fundamental asset. Although there are cer- 
tainly aspects of some derivative securities that differ dramatically from those 
of standard equity options and cannot be described in analogous terms, they 
need not concern us at the current level of generality. After developing a 
coherent framework for implementing general pricing formulas, we shall 
turn to modifications tailored to particular derivative securities. 


Y. J. / Parameter Estimation of Asset Price Dynamics 


The term "parametric" in this section's title is meant to emphasize the re- 
Hanee of à class of option-pricing formulas on the particular assumptions 
concerning the fundamental asset's price dynamics. Although these rather 
strong assumptions often yield elegant and tractable expressions for the 
option's price, they are typically contradicted by the data, which does not 
bode well for the pricing formula'ssuccess. In fact, perhaps the most impor- 
dant aspect of à successful empirical implementation of any option-pricing 
mode! is correctly identifying the dynamics of the stock price, and uncer- 
tainty regarding these price dynamics will lead us to consider nonparametric 
alternatives in Chapter 12. 

But for the moment, let us assert that the specific form of the stock 
price process РОР is known up to a vector of unknown parameters Ө which 
lies in some parameter space ©, and that it satisfies the following stochastic 
differential equation: 


аР = AP tio) .: B) AH). ге [0, T], (9.3.1) 


where BU) is a standard Wiener process and Ө = [a P y isa (kxl) vector 
of unknown parameters. The functions а, f; ) and АР, t H) are called 
the drift aud diffusion functions, respectively, or the coefficient functions collec- 
tively, For example, the lognormal diffusion assumed by Black and Scholes 
(1973) is given by the following coefficient functions: 


aU oa) = nP (9.3.2) 


APB) = oP (9.3.3) 
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In this case, the parameter vector @ consists of only two elements, the con- 
stants œ and В. 


In the more general case the functions а(Р, t; а) and 6(P, t; 8) must be | 


restricted iu some fashion so as to ensure the existence of a solution to the 
stochastic differential equation (9.3.1) (see Arnold (1974], for example). 
Also, for tractability we assume that the coefficient functions only depend 
on the most recent. price P(t); hence the solution to (9.3.1) is a Markov 
process. "This assumption is not as restrictive as it seems, since non-Markov 


processes can often be re-expressed as a vector Markov process by expansion | 


of the states, i. e., by increasing the number of state variables so that the col- 
lection of prices and state variables is a vector-Markov process. In practice, 
however, expanding the states can also create intractabilities that may be 
morc difficult to overcome than the non-Markovian nature of the original 
price process. 

For option-pricing purposes, what concerns us is estimating Ө, since 
pricing formulas for options on P(t) will invariably be functions of some or 
all of the parameters iu Ө, In particular, suppose that an explicit expression 
for the option-pricing function exists and is given by G(P(t), 0) where other 
dependencies on observable constants such as strike price, time-to-maturity, 
and the interest rate have been suppressed for notational convenience.!5 An 
estimator of the option price б may then be written as G= G(P(t), д), where 
Û is some estimator of the parameter vector 8. Naturally, the properties of 
G are closely related to the properties of Ó, so that imprecise estimators of 
the parameter vector will yield imprecise option prices and vice versa. To 
quantify this relation between the precision of G and of Û, we must first 
consider the procedure for obtaining 0. 


Maximum Likelihood Estimation 

The most direct method for obtaining Û is to estimate it from historical data. 
Suppose we have a sequence of n+ historical observations of P(t) sampled 
at non-stochastic dates 0 < h ... < t, which are not necessarily equally 
spaced, This is an important feature of financial time series since markets 
are generally closed on weekends and holidays, yielding irregular sampling 
intervals. Since P(t) is a continuous-time Markov process by assumption, 
irregular sampling poses no conceptual problems; the joint density function 
f of the sample is given by the following product: 


fU... Pi) = f P OT] SPs | Peas i: Ө), (9.3.4) 


k=! 


“Note that although the driftand diffusion functions depend on distinct parameter vectors 
a and f, these two vectors may contain some parameters in common. 


Even if G cannot be obtained in closed-form, 8 is a necessary input for numerical solutions 
of G and must still be estimated, 


* 
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where P, = ih), faa) is the marginal density function of Pa, and 
ДОР, te | Prev. -i: 0) is the conditional density function of P, given Рад, 
also called the transition density function. For notational simplicity, we will 
write (N. la | Pears 1544: 8) simply as fi. 

Given (9.3.4) and the observations m.... Py, the parameter vector 
may be estimated by the method of maximum likelihood (sce Section A.4 
of the Appendix and Silvey (1975, Chapter 4]). To define the maximum 
likelihood estimator 0. let £(9) denote the log-likelihood function, the natural 
logaridim of the joint density function of Aj, .... P, viewed as a function 


of 8: | 
20 log fi. (0.3.5) 


k=l) 


£(0) 


M 


The maximum likelihood estimator is then given by 


Û = argmax £(0). (9.3.0) 
Q0 


Under suitable regularity conditions, 0 is consistent and bas the following 
uormal limiting distribution: 


П 
ц 


А а 1 0*£(0) 
1)0 - 0) ~ N(0.27 (0). 20) = lim -Ej— ——— |. (937 
V/n(0 — 8) N (0)) (0) = lim Е A (9.3.7) 
where Z(0) is called the information matrix. When n is large, the asymptode 
distribution in (9.3.7) allows us to approximate the variance of 0 as 


ä l * 
hw|0] ж — Т-'(0), (9.3.8) 
n 
and the information matrix Z(@) may also be estimated in the natural way. 


P e ve) (9.3.9) 
u 0000 E 

Morcover, has been shown to be asymptotically efficient in the class of all 
consistent and uniformly asymptotically normal (CUAN) estimators: that 
is, it has the smallest asymptotic variance of all estimators that are CUAN, 
hence itis the preferred method of estimation whenever feasible, Of course, 
maximum likelihood estimation is ouly feasible when the likelihood func- 
tion can be obtained in closed form which, in our case, requires obtaining 
the transition density functions fiin closed form. Unfortunately, a closed- 
orm expression for fj for arbitrary drift and diffusion functions cannot be 
obtained in general. 

However, it is possible to characterize fe implicitly as the solution toa 
PDE. In particular, fix the conditioning variables Py, and f, | and let f; be 
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a function of P, and д: to emphasize this, we drop the subscript & from the 
arguments and write ACP, | Pay, 4-4). Then it follows from the Fokker- 


Planck or forward equation that f, must satisfy the following (see Lo [1988] 
for a derivation): 


afi aaf) 1 B? fy) : 
LL быыл کد‎ 3 9.3.10 
д! aP t 2 ap ( ! 
with initial condition 
JU, 17 —1 | Ру. Gay) = sq D. 1). (9.3.1 1) 


where 8(P— D, 4) is the Dirac-delta function centered at Рур. Maximum 
likelihood estimation is feasible whenever (0.3.10) can be solved explicitly 
and this depends on how complicated the coefficient functions a and раге, 

Once f; is obtained, Ô can be computed numerically given the data 
Jh.. Pa. Vo interpret this estimator, we must check that the regularity 
conditions for the consistency and asymptotic normality of Ô are satisfied. 
In some cases of interest they are пос. For example, a lognormal diffusion 
dP = Pdi to P dB violates the stationarity requirement, But in this case, a 
simple log-transformation of the data does satisfy the regularity conditions: 
„I. . . . yy Where n = log Py/ Pye, ds a stationary sequence. Maximum 
likelihood estimation of @ may then be perlormed with [7]. We shall return 
to Unis example in Section 9.3.2. 


GMM Estimation 

Por many combinations of coellicient functions 4 and b, (0.3.10) cannot 
be solved explicitly, hence for these cases maximum likelihood estimation is 
infeasible. An alternative proposed by Hansen and Scheinkman (1995) is to 
apply Hansen’s (1982) generalized method of moments (GMM) estimator, 
which they extend to the case of strictly stationary continuous-time Markov 
processes (see the Appendix for an exposition of GMM). 

The focus of any GMM procedure is, ol course, the moment conditions 
in which the parameter vector @ is implicitly defined, The GMM estimator 
is that parameter vector Û that minimizes the "distance" between the sample 
moment conditions and their population counterparts. 

The properties of a GMM estimator depend critically on the choice of 
moment conditions and the distance metric, and for standard discrete-time 
GMM applications, these two issues have been studied quite thoroughly, 
Moment conditions are typically suggested by the interaction of condition- 
ing information and optimality or equilibrium conditions such as Euler 
equations, C. E., the orthogonality conditions of linear instrumental variables 
estimation (see the Appendix, Section A. ). In some cases, the optimal 
(in an asymptotic sense) distance metric can be deduced explicitly (sec. 
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for example, EEunilton (1994, Chapter 14]), and efficiency bounds can be 
obtained (see Hansen (1985) and Hansen, Heaton, and Ogaki (1988) ). 

But for continuous-ctime applications in finance, especially those involv- 
ing derivative securities, much less is known about the properties of GMM 
estimators, Indeed, onc of Hansen ind Scheinkman's (1995) main contri- 
butions is to show how to generate moment conditions for continuous-time 
Markov processes with discretely sampled data. Although a complete expo- 
sition of GMM estimation for continuous-time processes is beyond the scope 
of this text, the central thrust of their approach can be illustrated through 
a simple example. 

Suppose we wish to estimate the parameters of the following stationary 
diffusion process: 


dp = —y(p— Odi to dB, p = p, > 0, у > 0. (9.3.12) 


This is a continuous-time version ofa stationary AR (I) process with uncon- 
ditional mean ио (see Section 9.3.4 for further discussion), and hence it 
satisfies the hypotheses of Hansen and Scheinkman (1995). 

To generate moment conditions for (p()], Hansen and Scheinkman 
(1995) use the /ifinitesimal generator De, also known as the Dynkin operator, 
which is the time-derivative of à conditional expectation. Specifically, 


І 
D. = ES E]. (9.3.13) 


where the expectations operator Fel: is a conditional expectation, condi- 
tioned on p(0) = . 

This operator has several important applications for deriving moment 
conditions of diffusions. For example, consider the following heuristic cal- 
culation of the expectation of dp: 


Eddp] = Ed -ytp— 10 dt] + E,[o dB] (0.3.14) 

= Е) nA (9.3.15) 

dp = УЕЛА — 40 di (9.3.10) 
І 

4 . pl = ydp = д) (9.3.17) 
t 

Dap) = УЕ), (9.3.18) 


where (9.3.14) and (9.3.15) follow from the fact the expectation of a linear 
function is the linear functions of the expectation, (9.3.16) follows from the 
same property for the differential operator and from the fact that increments 
of Brownian motion have zero expectation, and (9.3.17) is another way of 
expressing (9.3.16). 
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Before considering the importance of (9.3.18), observe that (9.3.17) isa 
first-order linear ordinary differential equation in E;[p] which can easily be 
solved to yield a closed-form expression for E;( p] (note the initial condition 
F. 0] = po): 

Ep] = po e +p. 


By applying similar arguments to the stochastic differential equations of p^, 
р^, and so on—which may be obtained explicitly via Itó's Lemma—all higher 
moments of p may be computed in the same way. 

Now consider the unconditional version of the infinitesimal generator 
Dl. = dE{-]/dt. Aseries of calculations similar to (9.3.14)—(9.3.18) follows, 
but with one important difference: the time derivative of the unconditional 
expectation is zero, since p is a strictly stationary process, hence we have the 
restriction: 

Dip] = VD -ш = 0 (9.3.19) 
which implies 


Elp] = u | (9.3.20) 


and this yields a first- moment condition: The unconditional expectation of, 
p must equal the steady-state mean и. 
More importantly, we can apply the infinitesimal generator to any well- 


behaved transformation f(-) of pand by Itó's Lemma we have: | 


2 
D(f()] = VM ο | -0 (9.3.21) 


which yields an infinite number of moment conditions—one for each f—. 
related to the marginal distribution of f (p). From these moment conditions, 
and under the regularity conditions specified by Hansen and Scheinkman, 
(1995), GMM estimation may be performed in the usual way. 

Hansen and Scheinkman (1995) also generate multipoint moment con- 
ditions—conditions which exploit information contained in the conditional 
and joint distributions of f(p)—making creative use of the properties of 
time-reversed diffusions along the way, and they provide asymptotic approx- 
imations for statistical inference. Although it is too early to say how their 
approach will perform in practice, it seems quite promising and, for many 
Иб processes of practical interest, the GMM estimator is currently the only 
one that is both computationally feasible and consistent. 


9.3.2 Estimating o in the Black-Scholes Model 


To illustrate the techniques described in Section 9.3.1, we consider the 


implementation ofthe Black-Scholes model (9.2.16) in which the parameter 
o must be estimated. 
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А common misunderstanding about o is that it is the standard deviation 
Mf simple returns E, of the stock. II, for example, the annual standard 
leviation of IBM's stock return is 30%, it is often assumed that o = 0.30. То 
see why this is incorrect, let prices P(t) follow a lognormal diffusion (9.2.2) 
ds required by the Black-Scholes model (sce Section 9.2.1) and assume, for 
expositional simplicity, that prices are sampled at equally spaced intervals of 
length A in the interval [0, 7], hence P, = Ph), k = O. I... and 7 = 
nh. Then simple returns Г, (Л) = (Р/Р) — | are lognormally distributed 
with mean and variance: 


ERM] = е1 (9.3.22) 
МАЮ] = e. [0-1]. (9.3.23) 


Therefore, the magnitude of IBM's o cannot be gauged solely by the 30% 


estimate since this is an estimate of War and not ofa. In particular, 
solving (9.3.22) and (9.3.23) for o yields the following: 


1 Varl Ry(h)] ua 
E Ste pers rer ТИНА f 9.3.24 
s E 1o ( EI )| ox 


Therefore, a mean and standard deviation of 10% and 30%, respectively, 
for IBM's annual simple returns implies a value of 26.8% for o. While 
30% and 26.8% may scem alinost identical for most practical purposes, 
the foriner value implies a Black-Scholes price of $8.48 for a one-year call 
option with a $35 strike price on a $40 stock, whereas the latter value implies a 
Black-Scholes price of $8.10 on the same option, an economically significant 
difference. 

Since most published statistics for equity returns are based on simple 
returns, (9.3.24) isa useful formula to obtain a quick “ballpark” estimate ofa 
when historical data is not readily available. If historical data are available, it 
is a simple matter to compute the maximum likelihood estimator of using 
continuously compounded returns, as discussed in Sections 9.3.1 and 9.3.3. 
In particular, applying Itó's Lemma to log PC) and substituting (9.2.2) for 
dP yields 


dlog? = (u — $o*)dt+odB = adt adl (9.3.25) 


Where à = — ło”. Therefore, continuously compounded returns 100 = 
lóg(/P4-4) are HD normal random variables with mean ол and variance 

*h hence the sample variance of ОТГ should be à good estimator of 
d*; in fact, the sample variance of n U0/ n is the maximum likelihood 
estimator of o. More lormally, under (9.3.25) the likclihood function of a 
sample of continuously compounded returns n(A)... -s , is 


: 1 п . . 
Lilac) = =, logra? — 2527 Law — ah}? (9.3.26) 
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and in this case the maximum likelihood estimators for w and о? can be 
obtained in closed form: 


apes І 
й = — yt 9.3.27 
nh 3 n ( ) 
5* = —— M (nt - 4%. 9.3.28 
a 7 2 ок 71) — ah) ( ) 


Morcover, because the 's are 11D normal random variables under the 
dynamics (9.2.2), the regularity conditions for the consistency and asymp- 
totic normality of the estimator 6? arc satisfied (sce the Appendix, Section 
A.4). Therefore, we are assured that & and &? are asymptotically efficient 
in the class of CUAN estimators, with asymptotic covariance matrix given by 
(9.3.8). 


Iniegularly Sampled Data 

To see why irregularly sampled data poses no problems for continuous-time 
processes, observe that the sampling interval Л is arbitrary and can change 
mid-sample without affecting the functional form of the likelihood function 
(9.3.26). Suppose, for example, we measure returns annually for the first ту 
observations and then monthly for the next n observations, Hf Ais measured 
in units of one усаг, so that A = l indicates a one-year holding period, the 
maximum likelihood estimator оѓо? [or the n, 4-2; observations is given by 


6" = 
; (9.3.99) 
where 
(1) = EMT 70/12) = L rao. 
Hy k= nv kai 

Observe that the second term of (0.3.20) may be rewritten as 

12 m — 2 

P - АТЛ). 

~= kut 


which is simply the variance estimator of montily continuously compounded 
returns, rescaled to an annual frequency. 

The ease with which irregularly sampled data can be accommodated 
is one of the greatest advantages of continuous-time stochastic processes, 
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However, this advantage comes at some cost: this great flexibility is the 
result of strong parametric restrictions that each continuous-time process 
imposes on alf its finite-dimensional distributions (see Section 9.1.1 for the 
definition of a finite-dimensional distribution). In particular, the stochas- 
tic differential equation (9.3.25) imposes independence and normality on 
all increments of log РО) and linearity of the mean and variance in the in- 
vrement interval, hence the continuously compounded return between a 
Friday and à Monday must have three times the mean and three times the 
variance of the continuonsly compounded return between a Tuesday and a 
Wednesday, the Tuesday/ Thursday return must have the same distribution 
as the Saturday/Monday return, and so on. In fact, the specification of a 
continious-time stochastic process such as (9.3.25) includes an infinity of 
parametric assumptions, Therefore, the convenience that continuous-time 
stochastic processes affords in dealing with irregular sampling should be 
weighed carefully against the many implicit assumptions that come with it. 


Continuaus-lecord Asymptatics 
Since we can estimate o and o. from arbitrarily sampled data, a natural 
question to consider is how to sample returns so as to obtain the "best" 
estimator? Ave 10 annual observations preferable to 100 daily observations, 
or should we inatch the sampling interval to the horizon we are ultimately 
interested in, eg., the time-toamatirity of the option we wish to valuc? 
To address these issues, consider again a sample of n prices Py, ^... ., 

^ equally spaced at intervals of length over the fixed time span [0, T] so 
that P = РОДА), k= . .. M. and T = nh. The asymptotic variance of the 
maximum likelihood estimator of [à o” Y is then given by (9.3.7), wrich 
may be evaluated explicitly in this case as: 


vaa] & Z (9.3.30) 
20 


^ a 
Varta?“ x 


9.3.31) 


Observe that (9.3.31) does not depend on the sampling interval Л, As 
n increases without bound while T is fixed (hence А decreases to 0), 82 
becomes more precise. “This suggests that the “best” estimator Гого? the one 
with smallest asymptotic variance, is the one based on as many observations 
as possible, vegardless of what their sampling interval is. 

Interestingly, this result does not hold for the estimator of à, whose 
asymptotic variance depends on T and not n. More frequent sampling 
within a fixed time span—olten called confinuous-record asymptotics—will 
not increase the precision of à, and the "best" estimator for o is one based 
on as long a time span as possible. 
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Table 9.24. Asymptotic standard errors fora. 


h 


гоо о qd r 13 1 1» 3 
я Бо 8 — UG RR FH SR ТОН 


[rd 
1 


2 0,2000 0.2828 0.4000 0.5657 0.8000 1.1314 1.6000 2.2627 3.2000 4.5255 
4 0.1414 0.2000 0.2828 0.4000 0.5657 0.8000 1.1314 1.6000 2.2627 3.2 

8 0.1000 0.1414 0.2000 0.2828 0.4000 0.5657 0.8000 1.1314 1.6000 2.262 
16 0.0707 0.1000 0.1414 0.2000 0.2828 0.4000 0.5657 0.8000 1.1314 1.6000 
32 0.0500 0.0707 0.1000 0.1414 0.2000 0.2828 0.4000 0.5657 0.8000 1.1314 
64 0.0354 0.0500 0.0707 0.1000 0.1414 0.2000 0.2828 0.4000 0.5657 0.8000 
128 0.0250 0.0854 0.0500 0.0707 0.1000 0.1414 0.2000 0.2828 0.4000 0.5657 
256 0.0177 0.0250 0.0354 0.0500 0.0707 0.1000 0.1414 0.2000 0.2828 0.4000 
512 0.0125 0.0177 0,0250 0.0354 0.0500 0.0707 0.1000 0.1414 0.2000 0.2828 
1,024 0.0088 0.0125 0,0177 0.0250 0.0354 0.0500 0.0707 0.1000 0.1414 0.2000 


Asymptotic standard error of à for various values of n and A, assuming a base interval of A = 1 
year and о = 0.20. Recall that 7 = nh; hence the values л = 64 and A = 1/16 imply a sample 
of 64 observations equally spaced over 4 years. 


Tables 9.2a and 9. 2b illustrate the sharp differences between à and б? by 
reporting asymptotic standard errors for the two estimators for various values 
of n and Л, assuming a base interval of h=] year and o =. 20. Recall that 
T=nh, hence the values n=64 and hg imply a sample of 64 observations 
equally spaced over 4 years. 

In Table 9.2a the standard error of à declines as we move down the table 
(corresponding to increasing T), increases as we move left (corresponding 
to decreasing 7), and remains the same along the diagonals (corresponding 
to a fixed T). For purposes of estimating a, having 2 observations measured 
at G-month intervals yields as accurate an estimator as having 1,024 observa- 
tions measured four times per day. 

In Table 9.2b the pattern is quite different, The entries are identical 
across the columns—only the number of observations n matters for deter- 
mining the standard error of 6?, In this case, a sample of 1,024 observations 
measured four times a day is an order of magnitude more accurate than a 
sample of 2 observations measured at six-month intervals. 

The consistency of 6? and the inconsistency of à under continuous- 
record asymptotics, first observed by Merton (1980) for the case of geomet- 
ric Drownian motion, is true for general diffusion processes and is an artifact 
of the nou-differentiability of diffusion sample paths (see Bertsimas, Kogan, · 
and Lo [1996] for further discussion). In fact, if we observe a continuous 
record of P(0 over any finite interval, we can recover without error the diſ- 
fusion coefficient a (°), even in cases where it is time-varying. Of course, in 
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Table 9.25. Asymptotic standard errors for 8? . 


zi a 


R То * [i TR Tu 515 Төл 


2 0.0400 0.0400 0.0400 0.0400 0.0400 0.0400 0,0400. 0,0400. оомо 6.000 

4 0.0283 0.0283 0.0283 0.0283 0.0283 0.0283 0.0283 0.0283 0.0285. 0.0283" 
В 0.0200 0.0200 40.0200 0.0200 0.0200 0.0200 (0.0200 (0.0200 0.0200 0,0200 
16 00141 0.0141. 0,0141. 0.0141. 0.0141. 0.014% 0.0141 0.01 оом 0,0141 
32 0.0100 0.0100 0.0100 0.0100 0.0100 00100 0.0100 0.0100 0.0100 0,0100 
64 0.0071 0.0071 0.0071 0.0071 0.0071 0.0071 0.0071 0.0071 0.0071 0.0071 
128 0.0050 0.0050 0.0050 0.0050 0.0050 0.0050 0.0050 0.0050 0.0050. 0,0050 
256. 0.0035 0.0035 0.0035 0.0035 0.0035 0.0035 0.0035 0.0035 0.0035 0.0035 
512 0.0025 0.0025 0.0025 0.0025 0.0025 0.0025 0.0025 0.0025 0.0025 0.0025 
1,024 0.0018 0.0018 0.0018 0.0018 0.0018 0.0018 0.0018 0.0018. 00018 0,0018 


Asymptotic standard error of û for various values of n and A, assuming a base interval ob А = 1 
year and о = 0.20. Recall that 7 = nh; hence the values n = 64 and A = 10/16 imply a sample 
of 64 observations equally spaced over 4 years. 


practice we never observe continuous sample paths—the notion of contin- 
uous time is an approximation, and the magnitude of the approximation 
error varics from one application to the next. 

AS the sampling interval becomes finer, other problems may arise such 
as the effects of the bid-ask spread, nonsynchronous trading, aud related 
market microstructure issues (sec Chapter 3). For example, suppose we 

qdecide to use daily data to estimate h should weekends and holidays 
be treated? Some choose to ignore them altogether, which is tantinount 
to assuming that prices exhibit по volatility when markets are closed, with 
the counterfactual implication that Friday's closing price is always equal to 
iMonday's opening price. Alternatively, we may use (9.3.29) in accounting 
ifor weekends, but such a procedure implicitly assumes that the price process 
I exhibits the same volatility when markets are closed, implying that the Friday- 
to-Monday return is three times more volatile than the Monday-to- Tuesday 
return, This is also counterfactual, as French and Roll (1986) have shown. 
| The benefits of more frequent sampling must be weighed against the 
costs, as measured by the types of biases associated with more fincly sampled 
data. Unfortunately, there are no general guidelines as to how to make 
such a tradeoff—it must be made on an individual basis with the particular 
application and data at һап!” 

| 


See Bertsimas, Kogan, and Lo (1996), Lo and Mackinlay (1989), Perron (1991), and 
Shiller and Perron (1985) for a more detailed analysis of the interactions between sampling 
[interval and sample size. 
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J. J. J Quantifying the Precision of Option Price Estimators 


Once the maximum likelihood estimator 0 of the underlying asset's param- 
eters is obtained, the maximum likelihood estimator ol the option price 
G may be constructed by inserting Û into the option-pricing formula (ог 
into the numerical algorithm that generates the price). Since Ô contains 
estimation error, Ge СРО), 0) will also contain estimation error, and for 
Gading and hedging applications it is imperative that we have some mea- 
sure of its precision. This can be constructed by applying a first-order Taylor 
Expansion to GÒ) to calculate its asymptotic distribution (see Section Ast 
ofthe Appendix) 


JUG- бу & VO. (0)) (9.3.32) 
aGouun, OY ӘСР), 0) 

vO = ل‎ T'U =, 9.3.33 

(е) 90 UE ad | ) 


where G = GPU, 0) and Z(@) is the information matrix defined in (9.3.7). 
Therefore, for large z the variance of G may be approximated by: 


^ 1 
Vart G) & 1,60). i (9.3.34) 
n 
and V, may be estimated in the natural way: 


IGP. уу ССРО. ӨЗ 
„= VO) = —— T (0) —————- З 9.3.35) 
add 00 m) ) 
In much the same way, the precision of the estimators of an option's sensi- 
tivity to its arguments—the option's delta, gamma, theta, rho, and vega (see 
Section 9.2, D) may also be readily quantified. 


The Black-Scholes Case 
As an illustration of these results, consider the case ol Black and Scholes 
(1973) in which P) follows a geometrie Brownian motion, 


IPQ) = aPN dt a P d BU) (9.3.36) 


aud in which the only parameter of interest is o. Since the maximum 
Н . + av * 2 . H - . . 
likelihood estimator G^ of о? has an asymptotic. distribution. given. by 


Vrbis follows from the principle of invasiance: The maximum likelihood estimator of à 
nonlinear function of a parameter vector is the nonlinear function ol the parameter vector's 
umximum likelihood estimator, See, lor example, Zehina (1966). 


^ RATA онан Models 


Huta? — о?) ~ N(0, 20!) (sce Section 9.3.2), the asymptotic distribution 
of the Black-Scholes ca- Option price estimator С is 
7 . old f r Й T-t p 2732 99“ 

VMG = G) S NT, V), V= d (Na? $^ (d), (9.3.37) 
where @(-) is the standard normal probability density function and 4, is 
given in (9.2.17). 

From the asymptotic variance V, given in (9.3.37), some simple com- 
parative static results may be derived: 


à V, " ay, р? Е 
n = % N -(b dd. F = xov 0-0 dh) dı (0.3.38) 
t 0. 4 
ayy 1 Д * 2 
— — ш 517 ~p { ) 
9077 ee e 


езу NN ARINE f 
11 1 (E) . (9.3.30) 
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The following inequalities may then be established: 


d V, P 
XR mc а (9.3.40) 
aP < X7 
àv, P 
o ie үр. Eun (9.3.41) 
aX X < 
aV, p І 
— 0 if — v, — 9.3.42 
WS. ox С 1) t ) 
aV, 7 
——— > 0 it aek (9.8.43) 
dC - 0 


where 


H 
E 
ly 
oa 
E 
— 


Inequality (9.3.40) shows that the accuracy of G decreases with the level of 
the stock price as long as the ratio of the stock price to the strike price is less 
than cı. However, as the stock price increases beyond Xer, the accuracy of 
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Table 9.3. Cutoff values for comparative statics of V. 


T-t aq Q l/o Cs | 
1 1.0015 0.9967 1.0033 0.0482 
2 1.0029 0.9933 1.0067 0.0682 
4 1.0059 0.9867 1.0135 0.0964 | 
8 1.0118 0.9736 1.0271 0.1363 
12 1.0177 0.9607 1.0409 0.1670 | 
24 1.0358 0.9229 1.0835 0.2361 | 
48 1.0729 0.8518 1.1740 0.3339 


б begins to increase. Inequality (9.3.41) shows a similar pattern for Уу with 
respect to the strike price. | 

Interestingly, inequality (9.3.43) does not depend on either the stock oz 
strike prices, and hence for shorter maturity options the accuracy of G will 
increase with the time-to-maturity Tt. But even if (9.3.43) is not satisfied, 
the accuracy of б may still decline with T—t if (9.3.42) holds. 

Table 9.3 reports values of с through cg for various values of T—¢ assum- 
ing an annual interest rate of 5% and an annual standard deviation of 50%, 
corresponding to weekly values of r = log(1.05)/52 and о = 0.50/ V52. 
Given the numerical values of с and 1/о, (9.3.42) will be satisfied by op- 
tions that are far enough in- or out-of-the-money. For example, if the stock 
price is $40, then options maturing in 24 weeks with strike prices greater 
than $42.09 or less than $38.02 will be more precisely estimated as the time- 
to-maturity declines. This is consistent with the finding of MacBeth and 
Merville (1979, 1980) that biases of in- and out-of-the-money options de- 
crease as the time-to-maturity declines, and also supports the observation 
by Gultekin, Rogalski, and Tinic (1982) that the Black-Scholes formula is 
more accurate for short-lived options.!? 

Through first-order Taylor expansions (see Section A.4 of the Appendix), 
the accuracy of the option's sensitivities (9.2.24)—(9.2.28) can also be read- 
ily derived, and thus the accuracy of dynamic hedging strategies can be 


measured. For convenience, we report the asymptotic variances of these 
quantities in Table 9.4. 


9.3.4 The Effects of Asset Return Predictability 
The martingale pricing method described in Section 9.2.2 exploits the fact 


* that the pricing equation (9.2.15) is independent of the drift of P(t). Since 


There me, of course, other possible explanations for such empirical regularities, such as 
the presence of stochastic volatility or a misspecitication of the stock price dynamics. 
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Table 9.4. Asymptotic variances of Black-Scholes call price sensitivity estimators, 


Estimator Asymptotic Variance 
A pra) di 
P 21e 1% 
R (а-о tata) 
v уан? 


These asymptotic variances are based on the assumption that the variance estimator 6? is the 
А n 2 " г; F " 1 " ^ Ч 2 " н 

maximum likelihood estimator which has asymptotic distribution Yata” — о?) ~ NO, 204), 
DEX — a рр 2 " 252 ` ч " n NS 

hence JEG?) — F(o*)) ^ Л, ааа 2) а e, where *)] is the option seusitivity. 

Following standard conventions, the expressions reported in the table are the asymptotic vari- 

ances of Ук?) - F(o?)) and must be divided by the sample size n to obtain the asymptotic 

variances of the (unnormalized) sensitivities FA”). 


the drift does not enter into the PDE (9.2.15), for purposesof pricing options 
it may be set to any arbitrary function or constant without loss of generality 
(subject to some regularity conditions). In particular, under the equivalent 
martingale measure in which all asset prices follow martingales, the option's 
price is simply the present discounted value ofits expected payoffat maturity, 


where the expectation is computed with respect to the rish-neutralized process 
PO? 


ap" (t) 


li 


rP* (t) dt HO] dB (9.3.44) 


il 


2 
dlog P" (1) dp (1) = ( — =) dt +o dB. (9.3.45) 


Although the risk-neutralized process is not empirically observable? it is 
“nevertheless an extremely convenient tool for evaluating tbe price of an 
“option on the stock with a data-renerating process given by P(t). 
Morcover, the risk-neutral pricing approach yields the following impli- 
cation: as long as the diffusion coefficient for the log-price process is a fixed 


“However, ander certain conditions сап be etunatect: see, for example, NitSahalia and 
1х»(1996), Jackwerth and Rubinstein (1995), Rubinstein (1994), Shimko (1993), and Section 
12.3.4 of Chapter 12. 
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constant a, then the Black-Scholes formula yields the correct option price 
regardless of the specification and arguments of the drift. This holds more 
generally for any derivative asset which can be priced purely by arbitrage, 
and where the underlying asset's log-price dynamics is described by an HO 
diffusion with constant diffusion coefficient: the derivative pricing formula 
is functionally independent of the drift and is determined purely by the 
diffusion coefficient and the contract specifications of the derivative asset, 

‘This may seem paradoxical since two stocks with the same о but different 
drifts will yield the same Black-Scholes price, yet the stack with the larger 
drift has a larger expected return, implying that a call option on that stock 
is more likely to be in-the-money at maturity than a call option on the stock 
with the smaller drift. The resolution of this paradox lies in the observation 
that although the expected payolf of the call option on the karger-drift stock 
is indeed larger, it must be discounted at a higher rate of return—one that 
is commensurate with the risk and expected return of its underlying stock— 
and this higher discount rate exactly offsets the larger expected рауо so 
that the present value is the same as the price of the call on the stock with 
the smaller drift. Therefore, the fact that the drift plays no direct vole in 
the Black-Scholes formula belies its importance. The same economic forces 
that determine the expected returns of stocks, bonds, and other financial 
assets are also at work in pricing options. 

These considerations are less important when the drift is assumed to 
be constant through tine, but if expected returns are time-varying so that 
stock returns are predictable to some degree, this predictability must be 
taken into account in estimating oa. 


The ‘Trending Orusteiu-Uhleubeck Process 

To see how a time-varying drift can influence option prices, we follow Lo 
and Wang's (1995) analysis by replacing the geometrie Brownian motion 
assumption (A3) of the Black-Scholes model with the following stochastic 
differential equation for the log-price process p): 


dpi = (—yG(U -nD н)Ф + о dB, (9.3.46) 
where 
y > 0, 100) = pr. te (0, оо). 


Unlike the geometric Brownian motion dynamics of the original Black- 
Scholes model, which implies that log-prices follow an arithmetic random 
walk with HD normal increments, this log-price process is the sum ofa zero- 
mean stationary autoregressive Gaussian process—an Ornstein-Uhlenbeck 
process—and a deterministic linear trend, so we call this the tending O-U 
process. Rewriting (9.3.46) as 


(ри) — iul) = -у(р edt | a dB (0.3.47) 
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shows that when PU) deviates from its trend AL, it is pulled back at a rate 
proportional to its deviation, where У isthe speed of adjustment. This reversion 
to the trend induces predictability in the returns of this asset, 

To develop further intuition for the properties of (9.3.46), consider its 
explicit solution: 


i 
WO m qub Y! ty taf e YU79 ABCs), (9.3.48) 


0 


from which we can obtain the unconditional moments and co-momenis of 
; : 2 
continuously compounded T-period returns n(r)z pO) t= r): ) 


E[n(r)) = gr (9.3.49) 
52 
Vae] = —[р-"] т» (9.3.50) 
Y 
в? 9 
R РЕ T ae ——— (~~) Oe mrs 
Cov[ n, Cr). n CT)! = yy [1 e T. 
er S b (9.3.51) 
! 
Corr мрт) = plr) = =z [=e]. (9.3.52) 


Since (9.3.46) is a Gaussian process, the moments (9.3.49)-(9.3.51) com- 
pletely characterize the finite-dimensional distributions of n (0) (see Section 
9.1.1 for the definition of a finite-dimensional distribution), Unlire the 
arithmetic Brownian motion or random walk which is nonstationary and 
often said to be difference-stationary Or a stochastic trend, the trending O-U 


process is said to be frend-stationary since its deviations from trend follow a 
. 2 
stationary process.! 


“Since we have conditioned on AO) = p, in defining the detrended log-price process, 
it is a slight abuse of terminology to call these moments "unconditional". 


However, in this 
case the distinction is primi ily sem 


antie since the conditioning variable is more of an initial 
condition than an information variable—it we define the beginning of time as ( = 0 and the 
fully observable starting: value of POV as p,, then (9.3.49)-(9.3.52) are unconditional moments 
relative to these initial conditions. We shall adopt this definition of an uncoudition 


al moment 
thoughout the remainder of this chapter, 


An implication of Uendstationuity is that the variance of T-period returns has a finite 
limit as т increases without bound, in this case а?/у, whereas this variance increases linearly 
with r under a random walk. While trend stationary processes are often simpler to estimate, 
they have been criticized as unrealistic models of financial 
well with the commen intuition that longer-horizon 
forecasts exhibit more uncertainty as the forecast le 


asset prices since they do not accord 
aset returns exhibit more risk or that price 
nion grows, However if the source ol such 
intuition is empirical observation, it may well be consistent with trend-st 


ationarity since it is 
now well-known that foi any finite set ot data, Uremdstationarity 


and dilfevencestationarity are 
virtually todistinpnishable (sec, for example, Section 2.7 in Chapter 2, Campbell and Perron 


11% . Hamilton [199], Chapters 17-18], and the many other “unit root” papers they cite), 
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Note that the first-order autocorrelation (9.3.52) of the trending O- 
U increments is always less than or equal to zero, bounded below by -L 


and approaches -i as r increases without bound. These prove to be se- 
rious restrictions for many empirical applications, and they motivate the 
alternative processes introduced in Lo and Wang (1995) which have con- 
siderably more flexible autocorrelation functions. But as an illustration of 
the impact of serial correlation on option prices, the trending O-U process 
is ideal: despite the differences between the trending O-U process and an 
arithmetic Brownian motion, both data-generating processes yield the same 
risk-neutralized price process (9.3.44), hence the Black-Scholes formula still 
applies to options on stocks with log-price dynamics given by (9.3.46). 

However, although the Black-Scholes formula is the same for (9.3.46), 
the ø in (9.3.46) is not necessarily the same as the o in the geometric 
Brownian motion specification (9.2.2). Specifically, the two data-generating 
processes (9.2.2) and (9.3.46) must fit the same price data—they are, after 
all, two competing specifications of a single price process, the "true" DGP. 
Therefore, in the presence of serial correlation, e.g., specification (9.3.46), 
the numerical value for the Black-Scholes input o will be different than in 
the case of geometric Brownian motion (9.2.2). 

To be concrete, denote by (т), 52 [I (T)], and pi (т) the unconditional 
mean, variance, and first-order autocorrelation of {n(t)}, respectively, 
which may be defined without reference to any particular data-generating 
process.? The numerical values of these quantities may also be fixed without 
reference to any particular data-generating process. All competing specifi- 
cations for the true data-gencrating process must come close to matching 
these moments to be plausihle descriptions of that data (of course, the 
best specification is one that matches all the moments, in which case the 
true data-generating process will have been discovered), For the arithmetic 
Brownian motion, this implies that the parameters (и, a?) must satisfy the 
following relations: 


т(т) = ит (9.3.53) 
s [s] = or (9.3.54) 


p(t) = 0. (9.3.55) 
i 


From (9.3.54), we obtain the well-known result that the Black-Scholes input 
a? may be estimated by the sample variance of continuously compounded 
returns r)]. 


Nevertheless, Lo and Wang (1995) provide a generalization of the trending O-U process that 
contains stochastic trends, in which case the variance of returns will increase with the holdings 
period r. 

“Ofcourse, it must be assumed that the moments exist. However, even if they do not, a sim- 
ilar but more involved argument may be based on location, scale, and association parameters. 
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In the case of the trending O-U process, the parameters (t. y. a?) must 


satisfy 


(т) = ит (9.5.56) 
; a? 
2 (t)) = — [1-7] т> 0 (9.3.57) 
У 


at [1- er]. (0.3.58) 


pitt) 


{ 2 
Det that these relations must hold for the population values of the pa- 
ameters if the trending O-U process is to be a plausible description of the 
DCP. Moreover, while (9.3.56)~(9.3.58) involve population values of the 
parameters, they also have implications for estimation, In particular, un- 
der the trending O-U specification, the sample variance of continuously 
tompounded returns is clearly not an appropriate estimator for a”. 
Holding the unconditional variance of returns fixed, the particular 
yaluc of a? now depends on у. Solving (9.3.57) and (9.3.58) for y and 


if yields: 


| 
y те log(1 + 2% 1 (r)) : (9.3.59) 


c? 5 (70 (l-) 


2 
N — „E =e) ]. — (03.60) 


which shows the dependence of o? on y explicitly. 

| In the second equation of (9.3.60), о? has been re-expressed as the 
product of two terms: the first is the standard Black-Scholes input under the 
assumption that arithmetic Brownian motion is the data-generating process, 
and the second term is an adjustment factor required by the trending O-U 
specification. Since this adjustment factor is an increasing function of y, 
as returns become more highly (negatively) autocorrelated, options ou the 
stock will become more valuable ceteris paribus. More specifically, (9.3.60) 
may be rewritten as the following explicit function of p(T): 


ot = rt)! log 4200100 
T 201 (1) 


. 4G) € (7$ 0] — (03.60) 


Holding fixed the unconditional variance of returns ў (т), as the absolute 
value of the autocorrelation increases from 0 to » the value of o? increases 
without bound. This implies that a specification error iu the dynamics of 
s(f) сап have dramatic consequences for pricing options. 


2We focus on the absolute value of the autocorrelation to avoid confusion in making, 
comparisons between results for negatively autocorrelated and positively autocorrelited asset 
returns. See Lo and Wang (1995) for further details. 
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As the return interval т decreases, it can be shown that the adjustment 
factor to Ker) /t in (9.3.61) approaches unity (use L'Hópital's rule). 
In the continuous-time limit, the standard deviation of continuously com- 
pounded returns isa consistent estimator ford and the effects of predictabil- 
ity on o vanish. The intuition comes from the fact that o is a measure of 
local volatility—the volatility of infinitesimal price changes—and there is no 
predictability over any infinitesimal time interval by construction (sce Sec- 
tion 9.1.1). Therefore, the influence of predictability on estimators for o is 
an empirical issue which depends on the degree of predictability relative to 
how finely the data is sampled, and must be addressed on a case-by-case ba- 
sis. However, we provide a numerical example in the next section in which 
the magnitude of these effects is quantified for the Black-Scholes case. 


Adjusting the Black-Scholes Formula for Predictability 

Expression (0.3.61) provides the necessary input to the Black-Scholes for- 
mula for pricing options on an asset with the trending O-U dynamics. Hf 
the unconditional variance of daily returns is s*[ncCD |, and if the first-order 
autocorrelation of r-period returns is p(T), then the price ofa call option 
is given by: 


6% (0). GLK Tono) = POD) Ke I ud). (9.3.62) 
where 


1 DI log(l -+ 2p) 
i т (l1 + 2р) — 1) | 


pl) € Cod. 0]. (0.3.03) 


and di and dy are defined in (9.2.17) and (9.2.18), respectively, 

Expression (9.3.62) is simply the Black-Scholes formula with an adjusted 
volatility input. The adjustment factor multiplying 3 [7(1)]/r in (9.3.63) is 
easily tabulated (see Lo and Wang [1995]); hence in practice itis a simple 
matter to adjust the Black-Scholes formula for negative autocorrelation of 
the form (9.3.58): Multiply the usual variance estimator s?[n(1)]/t by the 
appropriate factor from Table 3 of Lo and Wang (1995), and use this as a? 
in the Black-Scholes formula. 

Note that for all values of Hier) in (- 5. 0]. the factor mulüplying 
„In) /r in (9.3.63) is greater than or equal to one and increases in the ab- 
solute value of the first-order autocorrelation coefficient. This implies that 
option prices under the trending O-U specification are always greater than 
or equal to prices under the standard Black-Scholes specification, and that 
option prices are an increasing function of the absolute value of the first- 
order autocorrelation coefficient, These are purely features of the trending 
O-U process and do not generalize to other specifications of the drift (see 
Lo and Wang | 1995] for examples of other patterns). 


ee 
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Table 9.5. 


Black-Scholes 
Price 


10.028 10.028 10.028 010,028 | 10.028 | 10.028 | 10.028 
35 DORG 5.037 5.038 5.042 5.051 5.074 5.108 
40 0.863 0.885 0.010 0.973 1.062 1.216 1.568 
45 0.011 0.013 0.016 0.02+4 0.041 0.082 0.137 
50 0.000 0.000 0.000 | 0.000 0.000 0.001 0.005 


a0 1285 |! 1.336 | 11.786 | 12.238 | 12.725 
іп 7.558 7.446 | 7.746 |. 2.998 | 8.365 9.014 | 0.668 
10 1740 4.851 | 4976 | 5.286 5.728] 6.491 7.244 
45 2.810 2.922 048 | 3361 3812 | 4.595 | 5.375 
50 1.509 | 1.687 1.797 [2.073] 2.482 3.214 3.963 

‘Vime-sto-Maturity 7—4 = 364 Days 

| | „TT ES ре 
о 12,755 | BD | 12.050 | 13.218 13.620 11340 | 15.102 
35 9.493 9.622 | 9.769 | 10.133 | 10.061 | 11.582 | 12.501 
10 6.008 | 7.001 | 7.234 |. 7.600 | 8269 | 0,315 | 10.343 
45 AOU | 5.02; 6.283] 5.732 6.374 | 7478 | 8.566 
50 3.89 | 3.645 | 3.82 1.261 1.896 0.003 7.106 


Timeao-Matuirity T—1 = 7 Days 


a — — — — 
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Option prices on assets with negatively autocorrelated returns. 


Trending O-U Price, with Daily p, (1) = 


Coit bat of Blu k Scholes call aption prices eu at hypothetical $40 stock under an arithmetic 
Brownian motion versus a trending Oristein-tIhilenbeck process for log-prices, assuming a 
манал deviation of 276 for daily coutiniously compounded returns, and a daily continuously 


compounded riskfree sate of fogcG05)/364. As autocorrelation becomes larger in absolute 
value, option piices increase, 


An Empirical Hlustration 

To gauge the empirical relevance of this adjustment for autocorrelation, 
Table 9.5 reports a comparison of Black-Scholes prices under arithmetic 
Brownian motion and under the ending Ornstein-Uhlenbeck process for 
various holding periods, strike prices, and daily autocorrelations from —5 
to -45% for a hypothetical $40 stack. ‘The unconditional standard devia- 
tion of daily returns is held fixed at 2% per day. The Black-Scholes price is 
calculated according to (9.2.16), setting o equal to the unconditional stan- 
dard deviation, The trending O-U prices are calculated by solving (9.3.57) 
and (9.3.58) fora piven t and the return autocorrelations p1 (0 of —0.05, 


-0.10, —0.20, 0.30. —0.40, and —0.45, and using these values of in the 
Black-Scholes formula (9.2.16), where 7 =], 


9.3. Implementing Parametric Option Pricing Models 37 


The first panel of Table 9.5 shows that even extreme autocorrefation in 
daily returns does not affect short-maturity in-the-money call option prices 
very much, For example, a daily autocorrelation of ~45% has no impact 
on the $30 7-day call; the price under the trending O-U process is identical 
to the standard Black-Scholes price of $10.028. But even for such a short 
maturity, differences become more pronounced as the strike price increases; 
the at-the-money call is worth $0.863 in the absence of autocorrelation, but 
increases to $1.368 with an autocorrelation of —45%. 

However, as the time to maturity increases, the remaining panels of 
Table 9.5 show that the impact of autocorrelation also increases. With a 
—10% daily autocorrelation, an at-the-money 1-уеаг call is $7.234 and rises 
to $10.343 with a daily autocorrelation of —4576, compared to the standard 
Black-Scholes price of $6.908. This is not surprising since the sensitivity of 
the Black-Scholes formula to o the option's vega—is an increasing func- 
tion of the time-to-maturity (see Section 9.2.1). From (9.2.28), we see that 
for shorter-maturity options, changes in ø have very little impact on the call 
price, but longer-maturity options will be more sensitive. 

In general, the effects of asset return predictability on the price of deriva- 
tives depends intimately on the precise nature of the predictability. For ex- 
ample, the importance of autocorrelation for option prices hinges critically 
on the degree of autocorrelation for a given return horizon т and, of course, 
on the data-generating process which determines how rapidly this autocorre- 
lation decays with т. For this reason, Lo and Wang (1995) introduce several 
new stochastic processes that are capable of matching more complex pat- 
terns of autocorrelation and predictability than the trending O-U process. 


9.3.5 Implied Volatility Estimators 


Suppose the current market price of a one-year European call option on a 
nondividend-paying stock is $7.382. Suppose further that its strike price is 
$35, the current stock price is $40, and the annuat simple riskfree interest 
rate is 5%. If the Black-Scholes model holds, then the volatility o implied 
by the values given above can only take on one value—0. 200— because the 
Black-Scholes formula (9.2.16) yields a one-to-one relation between the op- 
tion's price and о, holding all other parameters fixed. Therefore, the op- 
tion described above is said to have an implied volatility of 0.200 or 20%. So 
common is this notion of implied volatility that options traders often quote 
prices not in dollars but in units of volatility, e.g., "The one-year European 
call with $35.000 strike is trading at 2096." 

Because implied volatilities are linked directly to current market prices 
(via the Black-Scholes formula), some investment professionals have argued 


that they are better estimators of volatility than estimraferfs trased erti t 
(ata such . рее vi ti ante (Arter mn nte u tiu mug 
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sinc¥ they are based on current prices which presumably have expectations 
of the future impounded in them. 

Jowever, such an argument overlooks the fact that an implied volatility 
is in imately related to a specific parametric option pricing model—typically 
the Black-Scholes model—which, in turn, is intimately related to a particular 
set of dynamics for the underlying stock price (geometric Brownian motion 
in the Black-Scholes case). Ilerein lies the problem with implied volatilities: 
If the Black-Scholes formula holds, then the parameter o can be recovered 
withput error by inverting the Black-Scholes formula for any one option's price 
(each of which yields the same numerical value for a); ifthe Black-Scholes 
Roads does not hold, then the implied volatility is difficult to interpret 
since itis obtained by inverting the Black-Scholes formula. Therefore, using 
the |mplied volatility of one option to obtain a more accurate forecast of 
volafility to be used in pricing other options is either unnecessary or logically 
inconsistent. 

lo sce this morc clearly, consider the argument that implied volatilities 
are better forecasts of future volatility because changing market conditions 
cause volatilities vary through time stochastically, and historical volatilities 

canot adjust to changing market conditions as rapidly. The folly of this 
argument lies in the fact that stochastic volatility contradicts the assumptions 
required by the Black-Scholes model—if volatilities do change stochastically 
through time, the Black-Scholes formula is no longer the correct pricing 
formula and an implied volatility derived from the Black-Scholes formula 
provides no new information. 

Of course, in this case the historical volatility estimator is equally useless, 
since it need not enter into the correct stochastic volatility option-pricing 
formula (in fact it does not, as shown by Hull and White [1987], Wiggins 
[1987] and others—see Section 9.3.6). The correct approach is to use a 
historical estimator of the unknown parameters entering into the pricing 
formula—in the Black-Scholes case, the parameter o is related to the his- 
torical volatility estimator of continuously compounded returns, but under 
other assumptions for the stock price dynamics, historical volatility need not 
play such a central role. 

This raises an interesting issue regarding the validity of the Black-Scholes 
formula. If the Black-Scholes formula is indeed correct, then the implied 
volatilities ofany sctof options on the same stock must be imunertcally identical. 
Of course, in practice they never arce; thus the assumptions of the Black- 
Scholes model cannot literally be true. This should not come as a complete 
surprise; after all the assumptions of the Black-Scholes model imply that 
options are redundant securities, which eliminates the need for organized 
options markets altogether. 

The difficulty lies in determining which of the many Black-Scholes as- 
sumptions are violated. II. for example, the Black-Scholes model fails em- 
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pirically because stock prices do not follow а lognormal diffusion, we may 
be able to specify an alternate price process that fits the data beter, in 
which case the “implied” parameter(s) o£ options ou the sume stock may 
indeed be numerically identical. Alternatively, if the Black-Scholes model 
fails empirically because in practice it is impossible to trade continuously 
due to transactions costs and other institutional constraints, then markets 
are never dynamically complete, options are never redundant securities, 
and we should never expect "implied" parameters of options on the same 
stock to be numerically identical for any option-pricing formula. In this 
case, the degree to which implied volatilities disagree may be an indication 
of how "redundant" options really arc. 

The fact that options traders quote prices in terms ol Black-Scholes 
implied volatilities has no direct bearing on their usefulness from а pric- 
ing point of view, but is a remarkable testameut to the popularity of the 
Black-Scholes formula as a convenient heuristic, Quoting prices in terms of 
“ticks” rather than dollars has no fu-reachiug economic implications sim- 
ply because there is a well-known one-to-one mapping between ticks and 
dollars. Moreover, just because options traders quote prices in terms of 
Black-Scholes implied volatilities, this docs not imply that they аге using 
the Black-Scholes model to set their prices. Implicd volatilities do convey 
information, but this information is identical to the information contained 
in the market prices on which the implied volatilities are based. 


9.3.6 Stochastic- Volatility Models 


Several empirical studies have shown that the geometric Brownian motion 
(9.2.2) is not an appropriate model fov certain security prices. For exiun- 
ple, Beckers (1983), Black (1976), Blattberg and Gonedes (1974), Christie 
(1982), Fama (1965), Lo and MacKinlay (1988, 1990c), and Mandelbrot 
(1963, 1971) have documented important departures from (9.2.2) for US 
stock returns: skewness, excess kurtosis, serial correlation, and time-varying 
volatilities. Although cach of these empirical regularities has implications 
for option pricing, itis the last one that has received the most attention in the 
recent derivatives literature.?“ partly because volatility plays such a central 
role in the Black-Scholes/Merton formulation aud in industry practice. 

If in the geometric Brownian motion model (0.2.2) the o isa known 
deterministic function of time e(0, then the Black-Scholes formula still 
applies but with o replaced by the integral 2 a(x) dy over the option’s 
life. However, if o is stochastic, the situation becomes more complex. For 


See, for example; Amin and Ng (1993), Ball and Roma (CE D, Beckers (ЧОМО), Cox 
(1975), Goldenberg (10901), Heston (1903), Eobinann, Phiten, and Schweizer (1992), Hull 
and White (1987), Jolinson and Shanno (1987), Seon (E087), and Wiggins (1987). 
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example, suppose that the fundamental assets dynamics are given by: 
dP = прас aPdli, (9.3.64) 
is ? do = о(о) + В(о)аВ,. (9.3.65) 


where a () and H0) are ibitrarv functions of volatility a, and В, and В, аге 
standard Brownian motions with instantaneous correlation dii, dl, = pdt. 
In this case, it may not be possible to determine the price of an option by 
arbitrage arguments alone, for the simple reason that there may not exist a 
dynamic self-financing portfolio strategy involving stocks and riskless bonds 
that can perfectly replicate the option's payoff. 

Heuristically, stochastic volatility introduces a second source of uncer- 
tainty into the replicating portfolio and if this uncertainty (Б) is not per- 
fectly correlated with the uncertainty inherent in the stock price process 
(Hp), the replicating portfolio will not be able to "span" the possible out- 
comes thatan option may realize at maturity (see Harrison and Kreps [1979] 
and Diffie and Huang [1985] fora more rigorous discussion). Of course, 
a were the price of a traded asset, then under relatively weak regularity 
conditions there would exist a dvnamie self-financing portfolio strategy con- 
sisting of stocks, bonds, ind the volatility asset that could perfectly replicate 
the option, 

In the absence of this additional hedging security, the only available 
method for pricing options in the presence of stochastic volatility of the 
form (9.3.05) is to appeal 10 a dynamic equilibrium model. Perhaps the 
simplest approach is to assert that the risk associated with stochastic volatility 
is not priced in equilibrium, This is the approach taken by Hull and White 
(1987) for the case where volatility follows a geometric Brownian motion 


dP = pPdt+oPab, (9.3.66) 
do? = «a^ dt E. (0.3.67) 


By assuming that volatility is uncorrelated with aggregate consumption, they 
show that equilibrium option prices are given by the expectation of the 
Black-Scholes formula, where the expectation is taken with respect to the 
average volatility over the aption’s life, 

Using the dynamic equilibrium models of Garman (1976) and Cox, 
Ingersoll, and Ross (19855), Wiggins (1987) derives the equilibrium price 
of volatility risk in an economy where agents possess logarithmic utility func- 
tions, vielding an equilibrium conditioni the form of a PDE with certain 
boundary conditions—lor the instantaneous expected return of the option 
price. Other derivativepricing models with stochastic volatility take similar 
approaches, the dilferences coming from the type of equilibrium model 
emploved or ihe choice of preferences that agents exhibit. 
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Parameter Estimation * 
One of the most challenging aspect of stochastic-volatility models, is the fa 
that realizations of the volatility process are unobservable yet option-pricing? 
formulas are invariably functions of the parameters of the process driving о. 
To date, there has been relatively little attention devoted to this important 
issue for continuous-time processes like (9.3.64)-(9.3.65) primarily because 
of the difficulties inherent in estimating continuous-time models with dis- 
cretely sampled data. However, a great deal of attention has been devoted to 
a related discrete-time model: the autoregressive conditional heteroskedas- 
ticity (ARCH) process of Engle (1982) and its many variants (see Chapter 12 
and Bollerslev, Chou, and Kroner [1992]). 

Although originally motivated by issues other than option pricing, 
ARCH models does capture tlie spirit of some of the corresponding contin- 
uous-time models. Recent studies by Nelson and Foster (1994), Nelson and 
Ramaswamy (1990), and Nelson (1991, 1992, 1996) provide some important 
links between the two. Ín particular, Nelson (1996) and Nelson and Foster 
(1994) derive the continuous-record asymptotics for several discrete-time 
ARCH processes, some of which converge to the continuous-time processes 
of Hull and White (1987) and Wiggins (1987). The empirical properties of 
these estimators have yet to be explored but will no doubt be the subject of 
future research. 


Discrete-Time Models | 


Another approach is to begin with a discrete-time dynamic equilibrium 
model for option prices in which the fundamental assets price dynamics 
are governed by an ARCH model. Although it is typically impossible to 
price securities by arbitrage in discrete time, continuous-time versions must 
appeal to equilibrium arguments as well in the case of stochastic volatility; 
hence there is little loss of generality in leaving the continuous-time frame- 
work altogether in this case. This is the approach taken by Amin and Ng 
(1993), who derive option-pricing formulas for a variety of price dynamics— 
stochastic volatility, stochastic consumption growth variance, stochastic in- 
terest rates, and systematic jumps—by applying the discrete-time dynamic 
equilibrium models of Brénnan (1979) and Rubinstein (1976). 

Diserete-time models are also generally easter to implement empirically 
since virtually all historical data are sampled discretely, financial transac- 
tions are typically recorded at discrete intervals, parameter estimation and 
hypothesis testing involve discrete data records, and forecasts are produced 
at discrete horizons. For these reasons, there may be an advantage in mod- 
eling stochastic volatility via ARCH and pricing derivatives within a discrete- 
üme equilibrium model. 

However, continuous-time models do offer other insights that are harder 
to come by within a discrete-time framework. For example, the dynamics 
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of nonlinear functions of the data-gencrating process are almost impossible 
to obtain in discrete time, but in continuous time [6's differentiation rule 
gives an explicit expression for such dynamics. Theoretical insights into 
the equilibrium structure of derivatives prices—for example, which state 
variables affect derivatives prices and which do not—are also more readily 
obtained in a continuous-time framework such as Cox, Ingersoll, and Ross 
(1985b). Therefore, each set of models offers some valuable insights that 
are not contained in the other. 


9.4 Pricing Path-Dependent Derivatives Via Monte Carlo Simulation 


Consider a contract at date 0 that gives the holder the right but not the 
obligation to sell one share of stock at date 7 for a price equal to the max- 
imum of that stock's price over the period from 0 to T. Such a contract, 
often called a lookback option, is clearly a put option since it gives the holder 
the option to sell at a particular price at maturity. However, in this case the 
strike price is stochastic and determined only at the maturity date. Because 
the strike price depends on the path that the stock price takes from 0 to T, 
and not just on the terminal stock price P(T), such a contract is called а 
path-dependent option. 

Path-dependent options have become increasingly popular as the hedg- 
ing needs of investors become сусг more complex. For example, many 
multinational corporations now expend great efforts to hedge against ex- 
change-rate fluctuations since large portions of their accounts receivable 
and accounts payable are denominated in foreign currencies. One of the 
most popular path-dependent options are foreign currency average rate ог 
Asian options which gives the holder the right to buy foreign currency at a 
rate equal to the average of the exchange rates over the life of the contract. 

Path-dependent options may be priced by the dynamic-hedying ap- 
proach of Section 9.2.1, but the resulting PDE is often intractable. The risk- 
neutral pricing method offers a considerably simpler alternative in which 
the power of high-speed digital computers may be exploited. For example, 
consider the pricing of the option to sell at the maximum. If P(t) denotes 
the date ? stock price and //(0) is the initial value of this put, we have 


HO) = ett к | Man 0 (9.4.1) 


0217 


The term "Asian? comes from the Lact iu such options were first actively written on stocks 
Mrading on Asian exchanges. Because these exchanges are usually smaller than thei European 
tnd American counterparts, with relatively thin wading aud low daily volume, prices on such 
exchanges are somewhat easier to manipulate. То minimize an option's exposure to the risk 
Y stock price manipulation, a new option was created with the average of the stock prices over 
be option's life playing the role ol the terminal stock price, 
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= TE) Mas P| Pun. (0.1.3) 
ба d 


where E^ is the expectations operator with respect to the risk-neutral prob- 
ability distribution or equivalent martingale measure, 

Observe that in going from (9.42) to (04:3) we have used the fact that 
the expected present value of PCL) discounted at the viskless rate r is (0). 
This holds because we have used the risk-neutral espectations operator I.“, 
and under the risk-neutral probabilities implicit in E' all assets must carn 
an espected retina of hence e !' E! [PCD)] = PO». 

To evaluate (04,3) via Monte Carlo simulation, we simulate many sam- 
ple paths of [PC], find the maximum value for cach simple path, or sepli- 
cation, and average the present discounted value of the masima over all the 
replications to vield an expected value over all replications, Һе an estimate 
of ION). Two Issues arise immediately: How do we simulate a continuous 
semple path, and how many replications do we need fora reasonably precise 
estimate of /7(0)7 


9.4.1 Disciele Versus Continuous Time 


By their very nature, digital computers are incapable of simulating truly 
conünuous phenomena; but as а practical matter they are often capable 
of providing excellent approximations. [i particula И we divide our tine 
interval [0, T] into à discrete intervals cach of length л, and simulate prices 
at each discrete date kh, k = O. .... n, the result will be an approximation 
toa continuous sample path which сап be made successively more precise 
by allowing 5» to grow and 4 to shrink so as to keep 7 fixed. 

For example, consider the case of geometric Brownian motion (9.2.2) 
for which the risk-neutral dynamics are given by 


APT) = Р жор ABD. (0.41.4) 


and consider simulating: the following approximate sample path 77: 


H 
P = PO exp Уо) A un ~ Moho 7h). (9.4.5) 

k=l 
Despite the fact that the simulated path P, varies only at multiples of . 
the approximation may be made arbitrarily precise by increasing „ and 
therefore decreasing fas „ inereases without bound, Ру converges weakly 
to (. -H.) (see Section 9. I. for further discussion). Unfortunately, there 
are no general rules for how large » must be to ме an adequate approxi- 

mation-—choosing z must be done on a case-by-case basis, 
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9.4.2 How Many Simulations to Perform 


We can, however, provide some clear guidelines for choosing the number 
of replications m to simulate. Recall that our Monte Carlo estimate of 77(0) 
involves a simple average across replications: 


m 


T | 
MO) = е us Yn Р), Y, m Max Ре (9.4.6) 
ES 


Oskan J 


where {Plo isthe fili replication or sample path of the stock price process 
under the risk-neutral distribution which, in the case of (9.4.4), implies that 
p=r- i But since by construction the Y,,'sare HD random variables with 
finite positive variance, the Central Limit Theorem implies that for large m: 


V» (Lan- 1) & NO ož). oo Vale v]. (9.4.7 


Therefore, for large in an approximate 95% confidence interval for 7700) 
E pI 
may be readily constructed from (0.4.7): 


" (ñu 1.966, C) Hy < fl» — Omer 
d 3s ич < — = А 4 Ж 
: ] м/т : 7 m 
(9.4.8) 


The choice of m thus depends directly on the desired accuracy of ШП (9). 
V, for example, we require a 770) that is within $0.001 of 77(0) with 95% 
confidence, in must be chosen so that: 


1.9600 00 1.96 ү WE T 
— Met А E > سسس‎ O-(n). 9.4.9 
ЖОКЕ " = Don / 


Typically Var[ Y,,] is not Known, but it can be readily estimated from the 
Simulations in the obvious way: 


peste 1 
Магы] = — 
m 


ùg 
Di 
lli 
= 


(0.4.10) 


Since the replications are HD by construction, estimators such as (9.4.10) 
will generally be very well-behaved, converging in probability to their expec- 
tations rapidly and, when properly normalized, converging in distribution 
just as rapidly to their limiting distributions. 


9.4.3 Comparisons with a Glosed-Form Solution 


In the special case of the option to sell at the maximum with a geometric 
Brownian motion price process, a closed-form solution for the option price 
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is given by Goldman, Sosin, and Gatto (1979): 


О PO) * E JE =|- P(0) 
= e = 
oVT 2 


в? (a сац) 
P(0 ie 1-0 9.4.11 
ih ( 2r )| (- aT ! | 
where a = r— 07/2. 


Therefore, in this case we may compare the accuracy of the Monte Carlo 
estimator F(0) with the theoretical value (0). Table 9.6 provides such a 
comparison under the following assumptions (for simple returns): 


Annual Riskfree Interest Rate = 596 
Annual Expected Stock Return = 15% 
Annual Standard Deviation of Stock Return = 20% 
Initial Stock Price P(0) = $40 
Time to Maturity Т = I Year. 


From the entries in Table 9.6, we see that large differences between the 
continuous-time price H(0) = $4.7937 and the crude Monte Carlo estimator 
Н (90) can arise, even when m and n are relatively large (the antithetic estima- 
tor is defined and discussed in the next section). For example, H(0) and 
F(0) differ by 30 cents when n = 250, a nontrivial discrepancy given the 
typical sizes of options portfolios. ; 

The difference between F(0) and H(0) arises from two sources: sam- 
pling variation in AWO) and the discreteness of the simulated sample paths 
of prices. The former source of discrepancy is controlled by the number 
of replications m, while the latter source is controlled by the number af 
observations n in each simulated sample path. Increasing m will allow us to 
estimate FH O)] with arbitrary accuracy, but if n is fixed then E*(H(0)] 
need not converge to the continuous-time price H(0). Does this discrep- 
ancy imply that Monte Carlo estimators are inferior to closed-form solutions 
when such solutions are available? Not necessarily, 

This difference highlights the importance of discretization in the pric- 
ing of path-dependent securities. Since we are selecting the maximum oveh 
k exponentials of the (discrete) partial sum 3 rf, where k ranges from 0 
to n, as n increases the maximum is likely to increase as well.?9 Heuristically, 


“Although it is probable that the maximum of the partial sum will increase with n, it 
is not guaranteed. As we increase ain Table 9.0, we generate a new independent random 
„ 


sequence [7 AS and there is always some chance that this new sequence with more terms will 
nevertheless vield smaller partial sums, 
. 
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: Table 9.6. Monte Carlo estimation of lookback option price. 
| 
Crude Antitlictic 
n mM m9 it A : 
IKO) ЕО 7100) SEID} 
100 1.3818 0.0165 4.3044 0.0066 
250 3.4911 0.0104 4.5136 А 0.0000 
365 4.5470 0.0165 4.5603 — .. 0.0066 
500 4.5746 0.0165 4.6007 0.0066 
750 1.66529 0.0166 4.0414 0.0066 
1,000 3.6448 0.0166 : 4.6493 0.0066 
2,000 4.6700 0.0160 4.7001 0.0067 
5.000 4.7175 0.0165 4.7269 0.0060 


Monte Galo estimator of the price of a oneven lookback put optio with continiousstime 
(ih- Se Gatto price АО 7037. Each row corresponds to an independent ser ol 
Simulations of. 100,000 replications of sample paths of length a. For the ашнен еу 
simulations, each sequence of HD random variates is used ісе original sequence aud 

их negative yielding a total of 200,000 sample paths, or 100,000 negatively corielated pairs of 
paths. SE] fiw] and SELF) | are the standard eiors of FO) and 7740), respectively, 


days) must be lower than the maximum of the daily highs over that same 
year (e со). Therefore, the continuous-time price #40), which is closer 
to the maximum of the daily highs, will almost always exceed the simulation 
price TIO) which is discretized, 

Which price is more relevant depends of course on the terms of the 


the maximum of the daily closing prices of P over the year (п = 250 trading 
E g 


particular contract, For example, average rate options on foreign exchange 
usually specify particular dates on which the exchange rate is measured, and 
itis almost always either à market closing vate (such as the corresponding 
spot rate of the IMM futures closing) or a central bank fixing rate. In both 
cafes, the more relevant price would be the simulation price, since the path 
dependence is with respect to the discrete set of measured vates, and iot an 
idéalized continuous process. 


9.4.4 Computational Efficiency 


i 

i 

1 
nj two main concerns of any Monte Carlo simulation are accuracy and 
computational cost, and in most cases there will be tradeoffs between the 
id As we saw in Section 04.2, the standard error of the Monte Carlo 
estimator П) is inversely proportional to the square root of the number 
of replications m, hence a 50% reduction in the standard error requires 
fonr times the number of replications, and so on. This type of Monte 
Carlo procedure is often described as crude Monte Carlo (see Haninersley 


I. Pricing Path-Dependent Derivatives Via Monte Carlo Simulation 387 


and Handseomb [1964] for example), for obvious reasons. Therefore, а 
number of varance-reduction techniques have been developed to improve 
the efficiency of simulation estimators. Although a thorough discussion of 
these techniques is beyond the scope of this text, we shall briefly review a 
few of them here.” 

A simple technique for improving the performance of Monte Carlo es- 
timators is to replace estimates by their population counterparts whenever 
possible, for this reduces sampling variation in the estimator. For example, 
when simulating risk-neutralized asset returns, the sample mean of cach 
replication will almost never be equal to its population mean (the riskless 
rate), but we can correct this sampling variation easily by adding the diffei- 
ence between the riskless rate and the sample mean to each observation of 
the replication. If this is done for cach replication, the result will be à set 
of replications with no sampling error for ihe mean. The efficiency gain 
depends on the extent to which sampling errors for the mean contributes 
to the overall sampling variation of the simulation, but in many cases the 
improvement can be dramatic. 

A related technique is to exploit other forms of population informa- 
поп. For example, suppose we wish te estimate ЕА Тапа we find a 
random variable g(Y) such that E* | g( Y)] is close to E* | (CN) | and Еа 
is known (this is the population information to be exploited). E*T/(X)] 
might be the price of newly created path-dependent derivative which must 
be estimated, and E*[g(¥)] the market price of an existing derivative with 
similar characteristics, hence a similar expectation. By expressing E*(/(X)] 
as the sum of E*fg(Y)] and EHO - 0). the expectation to be esti- 
mated is decomposed into two terms where the first term is known and the 
second term can be simulated with much smaller sampling variation. This 
technique is known as the control variate method—g(’) is the control variate 
for f CX) —and its success depends on how close E* [eC Y)] is to E*[ /(X)]. 

Another form of population information that can be exploited is symine- 
Uy. II. for example, the population distribution is symmetric about its mean 
and this mean is known (as in the case of risk-neutralized asset returns), 
then m/2 replications can yield m sample paths since cach replication can 


Scvetal texts provide excellent coverage of this menü. Hammersley and Uandscomb 
(1964) is a classic, concise but complete. Kies and Whitlock (1986) provide a more detailed 
and updated exposition of similar material. Fishman (10996) is considerably more compirehen- 
sive and covers several advanced topics not found in other Monte Carlo tests such as Markos 
chain sampling, Gibbs sampling, random tours; дий silted annealing. Fishman (1990) 
also contaius many applications, explicit algorithms tor many of the techniques covered, and 
FORTRAN soltwate (bom an ftp site) for random number generation, Finally, Fang and Wang 
(1% present a compact introduction toa new approach to Monte Calo simulation based on 
purely deterministic sampling. AMthough it is still too early to tell how this approach compares 
tothe more iniditional methods, Fang and Wang (1940 provide some intriguing examples 
that look quite promising. 
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be "reflected? through is mean to produce a mirrorimage which has the 
same statistical properties, This approach yields an added benefit: nega- 
tive correlation among pairs of repheations. If the summands of the Monte 
Carlo estimator are monotone functions of the replications they will also be 
negatively correlated, implying a smaller variance for the estimator. 

This is a simple example of a more general technique known as the 
antithetic variates method in which correlation is induced across replications 
to reduce the variance of the sum. A more formal motivation for this ap- 
proach comes from the following theorem: for any estimator which can be 
expressed as the sum of random variables, it is always possible to create à 
strict functional dependence between the summands which leaves the es- 
timator unbiased but yields a variance that comes arbitrarily close to the 
minimum variance possible with these random variables (see Hammersley 
and Mauldon 1956). Of course, the challenge is to construct such a func- 
tional dependence, but even if he optimal transformation is not apparent, 
substantial efficiency gains can be achieved by simpler kinds of dependence. 

Variance reduction can also be accomplished by more sophisticated 
sampling methods. In stratified sampling, the support of the basic random 
variable X being simulated is partitioned into a finite number of intervals 
and crude Monte Carlo simulations are performed in each interval. If there 
is less variation in / (X) ith intervals than across the intervals, the sampling 
variation of the estimator of E/ (X)] will be reduced. 

Importance sampling is à morc sophisticated version, sampling more fre- 
quently in regions of the support where there is more variation in f( X)— 
where sampling is more "important"—instead of sampling at regular inter- 
vals. Au even more sophisticated version of this method has recently been 
proposed by Fang and Wang (1994) in which replications are generated de- 
terministically, not randomly, accordingly to an algorithm that is designed to 
minimize the sampling variation of the estimator directly. It is still too carly 
to say how this approach—called the numnber-theoretic method—compares to 
the more traditional Monte Carlo estimators, but it has already found its way 
into the financial community (see, for example, Paskov and Traub [1995]) 
and the preliminary findings seem encouraging. 


An Hlustration of Variance Reduction 

To illustrate the potential power of vartince-reduction techniques, we con- 

struct an antithetic-variates estimator of the price of the one-year lookback 

* ct А i M . a n 

ut option of Section 9.1.3. For each simulated price path . „another 
I rl k=0 

cin be obtained without further simulation by reversing the sign of each of 

the randomly generated UD standard normal variates on which the price 
à a Е "EL л КИ . У 

pathi is based, vielding a second path {Py gag Which is negatively correlated 

А vire s м PE 2 |" dau. alites Pa 1 
with the Gest. I % sample paths of [ ss аге generated, the resulting 


* ўта 


9.4. Pricing Path-Dependent Derivatives Via Monte Carlo Simulation, 389 iE 


antithetic-variates estimator 77(0) is simply the average across all 2m paths 


AO) = e io» Yin + У .) - Р(0) (9.4.12) 


where 


Yin = Мах P, Y. z Max BR. 


б<й<п JU Osken 7 


The relation between antithetic-variates and crude Monte Carlo can be more 
easily seen by rewriting (9.4.12) as 


С | г 5 1 т Pi m ы 
= ے٣‎ -eT V YS | — PO) (9.4.1: 
HQ) (5 = A* E e i д P(0) (9.4.13) 
er l rum Yn — 0) (9.4 ib 
= J AM І 4. 


Ј=1 2 

Equation (9.4.13) shows that H (O) is based on a simple average of two av- 
ace a bz x ai 

erages, one based on the sample paths [P5]; and the other based on 

{ Ps, i p The fact that these two averages are negatively correlated leads to 

a reduction in variance. 

Equation (9.4.14) combines the two sums of (9.4.13) into one, with the 
averages of the anuthetic pairs as the summands. This sum is particularly 
easy to analyze because the summands are IID—the correlation is confined 
within cach summand, not across the summands—hence the variance of the 


sum is simply the sum of the variances. An expression for the variance of 
H (0) then follows readily 


A ! nt Yin 
Маг[й(0)] = aver | ع‎ J (9.4.15) 
m 


әт l 1 2 
е? i (5 Var[ Y] + 5 Covl Yin, inl) 


o2(n) 


= ك‎ (4p) (9.4.16) 
2m . 


where a? (п)= Varl Yin} = Маг[е py and p= Corr[e7'T Yin, Hul. 
Equation (9.4.15) shows that the variance of H(0) can be estimated by the 
product of e7'? / mand the sample variance of the IID sequence (Had Y.)/2]. 
There is no need to account for the correlation between antithetic pairs be- 
cause this is implicitly accounted for in the sample variance of („+ V/) / 2J. 

Equation (9.4.16) provides additional insight into the variance reduc- 
tion that antithetie variates affords. The reduction in variance comes from 
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two sources; a doubling of the number of replications from i to 2m, and 
the factor [+p which should be less than one i£ the correlation between the 
antithetic variates is negative. Note that even if the correlation is positive, 
the variance of ЙО) will still be lower than the crude Monte Carlo estima- 
tor 0) unless there is perfect correlation, i.c, „=. Also, while we have 
doubled the number of replications, we have done so in a computationally 
trivial way: changing signs. Since the computations involved in pseudo- 
random number generation are typically more demanding than mere sign 
changes, this is another advantage of andthetic-variates simulations. 

A comparison of the crude Monte Carlo estimator ПО) tothe antithetic- 
variates estimator TO) is provided in Table 9.6, For most of the simula- 
tions, the ratio of the standard error of ШҮ to the standard error of 00) 
is 0.0066/0.01652:0.400, a reduction of about 6095. In comparison, a dou- 
bling of the number of replications from m to 2m for the crude Monte Carlo 
estimator would yield a ratio of 1/ /2=0.707, only a 20% reduction. Моге 
formally, observe from (0,4,7) and (4.4.16) that the ratio of the standard or 
vor of /7(0) to the standard error of /7(0) is an estimator of UI. hence 
the ratio 0.0066/0.0 165 0.400 implies a correlation of —G8% between the 
antithetic pairs of the simulations in Table 9.6, a substantial value which is 
responsible for the dramatic reduction in variance of THO). 


9.4.5 Extensions and Limitations 


he Monte Carlo approach to pricing path-dependent options is quite gen- 
dral and may be applied to virtually any European derivative security. For 
example, to price average-rate foreign currency options we would simulate 
price paths as above (perhaps using a different stochastic process more ap- 
propriate for exchange rates), compute the average for each replication, 
repeat this many times, and compute the average across the replications. 
‘Thus the power of the Cox-Ross risk-neutral pricing method is consider” 
able. However, there are several important limitations to this approach that 
should be emphasized. 

First, the Monte Carlo approach may only be applied to Europeau op- 
Hons, options that cannot be exercised carly. The carly exercise feature of 
American options introduces the added complication of determining an 
optimal exercise policy, which must be done recursively using a Фанис 
programming-like analysis. In such cases, numerical solution of the corre- 
sponding PDE is currently the only available method for obtaining prices. 

Second, to apply the Cox-Ross technique to a given derivative security, 
we must first prove that the security сан be priced by arbitrage considera- 
tions alone. Recall that in the Black-Scholes framework, the no-arbitrage 
condition was sufficient to completely determine the option price only be- 
cause we were able t0 construct a dynamic portfolio of stocks, bonds, and 
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options that was riskless. In effect, this implies that the option is "spanucd" 
by stocks and bonds or, more precisely, the optiou's payolf at date T can 
be perfectly replicated by a particular dynamic trading strategy involving 
only stocks and bonds. The no-arbitrage condition translates into the re- 
quirement that the option price must equal the cost of the dynamic trading 
strategy that replicates the option's payotf. 

But there are situations where the derivative security cannot be repli- 
cated by any dynamic strategy involving existing securities. For example, if 
we assume that the diffusion parameter о in (9.2.2) is stochastic, then it may 
be shown that without further restrictions on о there exists no nondegen- 
спие dynamic trading strategy involving stocks, bonds, and options that is 
riskless. Heuristically, because there are now two sources of uncertainty, the 
option is no longer “spanned” by a dynamic portfolio of stocks and bonds 
(sec Section 0.4.6 and Huang [1992] for further discussion). 

Therefore, before we can apply the risk-neutral pricing method to a 
particular derivative security, we must first check that it is spanned by other 
traced assets. Since Goldman, Sosin, and Gatto (1979) demonstrate that 
the option to sell at the maximum is indeed spanned, we can apply the Cox- 
Ross method to that case with the assurance that the resulting price is in fact 
the no-arbitrage price and that deviations from this price necessarily imply 
rishless profit opportunities. But it may be more difficult to verity spanning 
for more complex path-dependent derivatives. In those cases, we may have 
to embed the security in a model of economic equilibrium, with specific 
assumptions about agents” preferences and their investment opportunity 
sets ats, for example, in the stochasticvolatility model of Section 9.3.6. 


9.5 Conclusion 
The pricing of derivative securities is one of the unqualificd successes of 


modern economies, It has changed the way economists view dynamic mod- 
els of securities prices, and it has had an enormous impact on the investment 


community. The creation of ever more complex financial instruments has 
been an important stimulus for academic research and for the establish- 
ment of a bona fide “financial engineering" discipline: Recent innovations 
in derivative securities include: average rate options, more general "look- 
back" options, barrier options (also known as “down and out" or “birth and 
death? options), compound options, dual-curteney or dual-equitv options, 
synthetic convertible bonds, spread lock. interest rate swaps, rainbow op- 
tions, and many other exotic securities. In each of these cases, closed-form 
pricing formulas are available only for a very small set of processes for the 
underlying asset's price, and a great deal of further research is needed to 
check whether such processes actually fit the data. Moreover, in many of 
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these cases, analytical expressions for hedging positions in these securities 
do not exist and must also be determined empirically, 

There are many unsettled issues in the statistical inference of continuous- 
time processes with discretely sampled data. Currently, the most pressing 
issue is the difficulty in obtaining consistent estimates of the parameters of 
hò processes with nonlinear drift and/or diffusion coefficients, For many 
по processes of interest, we do not have closed-form expressions for their 
transition densities and hence maximum likelihood estimation is not feasi- 
ble. The GMM approach of Hansen and Scheinkman (1995) may be the 
most promising alternative, and empirical applications and Monte Carlo 
studies are sure to follow. 

Another area oCactive current research involves developing better mod- 
els of fundamental asset price dynamics. For example, casual empirical ob- 
servation suggests the presence of jump components in asset prices that are 
responsible for relatively large and sudden movements, but occur relatively 
infrequently and are therefore considerably more challenging to estimate 
precisely?" Indeed, there is even some doubt as to whether such jump 
processes can ever be identified from discretely sampled price data since 
the very act of discrete-sampling destroys the one clear distinction between 
diffusion processes and jump processes—the continuity of sample paths. 

The difficulties in estimating parametric models of asset price dynamics 
have led to several attempts to capture the dynamics nonparametrically. For 
example, by placing restrictions on the drift coefficient of a diffusion pro- 
cess, AitSahalia (1993) proposes a nonparametric estimator of its diffusion 
coefficient and applies this estimator to the pricing of interest rate options. 
Longstatf (1995) proposes a test of option-pricing models by focusing on 
the risk-neutral distribution implicit in option prices, And Hutchinson, Lo, 
and Poggio (1094) attempt to price derivative securities via neural network 
models. Although itis still too carly to tell if these nonparametric and highly 
caeintensive methods will offer improvements over their parametric coun- 
terparts, the preliminary evidence is quite promising. In Chapter 12, we 
review some of these techniques and present an application to the pricing 
and hedging of derivative securities, 

Closely related to the issue of stock price dynamics are several open 
questions regarding the pricing of options in incomplete markets, markets 
in which the sources of uncertainty affecting the fundamental asset are not 
spanned by traded securities. For example, if the volatility of the funda- 
mental assets price is stochastic, it is only under the most restrictive set 
of assumptions that the price of an option on such an asset may be deter- 
mined by arbitrage arguments, Since there is almost universal agreement 


Mee, lor хире, Ball aod Porous (E083, 19085). Merton (10765) develops an option: 
pricing ни maula foi combined diltusion/qunp processes, Sec also Merton (19764) for more 
general discussion of he impactol misspecilving stock price dynamics on the pricing of options. 


Problems 593 


that volatilities do shift over time in random fashion, it is clear that issues 
regardiug inarket incompleteness are central to the pricing of derivative 
securities. 

In this chapter we have only touched upon a small set of issues that 
surround derivatives research, those that have received the least attention 
in the extant literature, with the hope that a wider group of academics and 
invesunent professionals will be encouraged to join in the fray and quicken 
the progress in this exciting area. 


Problems—Chapter 9 


9.1 Show that the continuous-time process n of Section 9.1.1 converges 
in distribution to a normally distributed continuous-time process p(t) by 
calculating the the moment-generating function of pn and taking limits. 


9.2 Derive (9.3.30) and (9.3.31) explicitly by evaluating and inverting the 
Fisher information matrix in (9.3.7) for the maximum likelihood estimators 
û and ô? of the parameters of a geometric Brownian motion based pn 


md sampled data. | 


9.3 Derive the maximum likelhood estimators fi, 6°, and y of the param- 
eters of the trending Ornstein-Uhlenbeck process (9.3.46), and calculate 
their asymptotic distribution explicitly using (9.3.7). How do these three 


estimators differ in their asymptotic properties under standard asymptotics 
and under continuous-record asymptoucs? 


9.4 You are currently managing a large pension fund and have invested 
most of itin IBM stock. Exactly one year from now, you will have to liquidate 
your entire IBM holdings, and you are concerned that it may be an inauspi- 
cious time to sell your position. CLM Financial Products Corporation has 
come to you with the following proposal: For a fee to be negotiated, they 
will agree to buy your entire IBM holdings exactly one year from now, but 
at a price per share equal to the maximum of the daily closing prices ovr 
the one-year period. What fee should you expect in your negotiations with 
CLM? Specifically: 

9.4.1 Estimate the current (time 0) fair market price H (0) of the option 

to sell at the maximum using Monte Carlo simulation. For simplicity, 


assume that IBM's stock price P(/) follows a geometric Brownian motion 
(9.2.2) so that 


P(t) 
or 
P Pu) 
and use daily returns of IBM stock over the most recent five-year period to 
estimate the parameters и and о? to calibrate your simulations. Assume 


~ N (utto — t),co?(l — ^). (9.5.1) 


— 
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that there are 253 trading days in a year and that market prices have no 
volatility when markets are closed, i. e., weekends, holidays. 


9.4.2 Provide a 95% confidence interval for H(0) and an estimate of the 
number of simulations needed to yield a price estimate that is within $.05 
of the truc price. 


9.4.3 How does this price compare with the price given by the Goldiman- 
Sosin-Gatto formula? Can you explain the discrepancy? Which price 
would you usc to decide whether to accept or reject C. M's proposal? 


10. Faxed-[nceme Securities 


The literature on fixed-income securities is vast.” We break it into two 
main parts. First, in this chapter we introduce basic concepts and discuss 
empirical work on linear time-series models of bond yields. This work is 
only loosely motivated by theory and has the practical aim of exploring the 
forecasting power of the term structure of interest rates. In Chapter 11 we 


turn to more ambitious, fully specified term-structure models that can be 
used to price interest-rate derivative securities. 


10.1 Basic Concepts 


In principle a fixed-income security can promise a stream of future payments 
of any form, but there are two classic cases, 

Zero-coupon bonds, also called discount bonds, make a single payment at a 
date in the future known as the maturity date. The size of this payment is 
the face value of the bond. The length of time to the maturity date is the 
maturity of the bond. US "Ircasury bills (Treasury obligations with maturity 
at issue of up to 12 months) take this form. 

Coupon bonds make coupon payments of a given fraction of face value at 
equally spaced dates up to and including the maturity date, when the face 
value is also paid. US Treasury notes and bonds (Treasury obligations with 
maturity at issue above 19 months) take this form. Coupon payments on 
Treasury notes and bonds are made every six months, but the COUPON rates 
for these instruments are normally quoted at an annual rate; thus a 7% 
Treasury bond actually pays 3.5% of face value every six months up io and 
including maturity.” 

Coupon bonds can be thought of as packages of discount bonds, one 
corresponding to each coupon payment and one corresponding to the final 
coupon payment together with the repayment of principal. This is not 
merely an academic concept, as the principal and interest components of 
US Treasury bonds have been traded separately under the Treasury's STRIPS 
(Separate Trading of Registered Interest and Principal Securities) program 
since 1985, and the prices of such Treasury strips at all maturities have been 
reported daily in the Wall Street Journal since 1989, 


Fortunately it has increased in quality since Ed Kane's judgement: 
that, ceteris paribus, the fertility of a field is roughly proportional to the quantity of manure that 
hasbeen «запре upon itin the recent past. By this standard, the term structure of interest rates 
has become... an extraordinarily fertile field indeed” (Kane 11970. Sce Melino (1988) or 
Shiller (1990) for excellent recent surveys, and Sundaresan (1996) fora book-length treatment. 

See a textbook such as Fabozzi and Fabozzi (1995) or Fabozzi (1996) for further details 
on the markets for US Treasury securities, 


"lt is generally agreed 
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10.1.1 Discount Bonds 


We first define and illustrate basic bond market concepts for discount bonds. 
The yield to maturity on a bond is that discount rate which equates the present 
value of the bond’s payments to its price. Thus if P, is the time t price of 
a discount bond that makes a single payment of $1 at time £ + n, and У, is 
tlie bond's yield to maturity, we have 


1 
Pn Tin TANI SA 10.1.1 
(7 YS d 
so the yield can be found from the price as 
-(1 | 
6 = bald. (10.1.2) 


It is common in the empirical finance literature to work with log or continu- 
ously compounded variables, This has the usual advantage thatit transforms 
the nonlinear equation (10.1.2) into a linear one. Using lowercase letters 
for logs the relationship between log yield and log price is 


1 
Ju = — (2) Pur- (10.1.3) 


The term structure of interest rates is the set of yields to maturity, at a given 
time, on bonds of different maturities. The yield spread Sn = Yar — Үү, or in 
log terms 5,4, = y, у is the difference between the yield on an‘n-period 
bond and the yield on а one-period bond, a measure of the shape of the 
term structure. The yield curve is a plot of the term structure, that is, a plot 
of Y,, OF ум against n on some particular date /. The solid line in Figure 
10.1.1 shows the log zero-coupon yield curve for US Treasury securities at 
the end of January 1987.4 This particular yield curve rises at first, then 
falls at longer maturities so that it has a hump shape. This is not unusual, 
although the yield curve is most commonly upward-stoping over the whole 
range of maturities. Sometimes the yield curve is inverted, sloping down over 
the whole range of maturities. 

Holding-Period Returns 

The holding-period return on a bond is the return over some holding period 
less than the bond's maturity. In order to economize on notation, we spe- 
cialize at once to the case where the holding period is a single period. We 


"This curve is not based on quoted strip prices, which are readily available only for rece 
years, but is estimated from the prices of coupon-bearing Treasury bonds. Figure 10.1.1 is 
due ta McCulloch and Kwon (1993) and uses McCulloch's (1971, 1975) estimation method as 
discussed in section 10.1.3 below. 


Shiller (1990) gives a much more comprehensive treatment, which requires more com- 
plicated notation. 
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Figure 10,1, Zev Coupon Vill and Fo ward-Rate Curves in January 1987 


define Nas the one-period holding-pertod return on an period bond 
purchased at time гапа sold at time (CE 1. Since the bond will be an (r = 1)- 
period bond when it is sold, the sale price is P,24 444 and the holding-period 
return is 


l^, OY" 
1 ＋ Renap . LAS (10.1.4) 


P Oo Yu uan! 


The holding-period return in (10.1.4) is high if the bond has a high yield 
when itis purchased at time © and dt has a low yield when itis sold at time 
t+ 1 (Since a low yield corresponds toa high price). 

Moving to logs Tor simplicity, the log holding-period тенин, gry = 
18 ). ig 


in = n arat Pa = нуш Ur ° Ya ta 
Үш Un s ty, Lid T Уш). (10.1.5) 
Whe fast equality in CHE) shows bow the hokding-period return is deter- 


mined by the beginuing-ol-period vield (positively) and the change in the 
vield over the holding period (negatively). 
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Equation (ТО. l. 5) can be reu ranged so that i ces the fog bond 
pite today to the log price tomorrow and tbe retur ever the nest period: 
Por Shena M fru auae One cin solve this dillerence equation forward, 
sulistiuting out future log bond prices until the manni date is reached 


Gui noting that the log price it matui equals zero) to obtain fg, = 
NEC 


و 


„ lia in terms of the vield 


Yn 
Ya = | — Es js (10.1.6) 
dut ( " ) 3 rete | 


mun 


This equation shows that the log vield to maunity on a 4er-coupon bond 
equals the average log return per period if the bond is helel to maturity. 


Мачо Rates 
Bonds of different maturities can be combined to guarantec an interest rate 
ol a fixed-income iavestinent to be made in the future: the interest rate on 
this investment is called a forward sate.” 

lo nee at time ан interest rate oia one-period investment to be 
made at time fb н, an investor ean proceed as follows, The desired future 
investment will pay $1 at time £4 4 Tso she fist buys one Gr-b period 
bond, this casts ^, j, MU Gane Gand pays $0 at dine f 4 Û. Phe investor 
wants to transfer the cost of this investment from time / to time f to do 
tiis she sells P, a Pu period bonds. This produces a positive cash How of 
7 ( /) om Paga at me t, exactly enough to offset the negative time 
tosh flow from the first transaction. The sale of period bonds implies a 
сше eash How o£ Pay ya P, at time . Chis can be thought olas the 
сом of the one-period investiment to be made at time fn. The cash flows 
resulting from these transactions are illustrated in Figure 10.2. 

The forward rate is defined to be the теста on the time investment 
Ob Put 


1 (1+ Y, 5* 1 
or Fu) = - . (20.1.7) 
(P a af Pul (Oo BY," 


Inthe notation Fy the Fist subscript refers to the namber of periods ahead 
that the oue-period investment is ta be made, aud the secound subscript 
rebas to the date at whieh the forward vate is set; At the cost of additional 
complesity in notation we could also define forward rates for mitiperiod 
investinents but we do not pursue this further here. 


"Na ele e Borssaed mading is the hen num maker n US сахо securities. Maer 
aM Gre heer ob nen secunties is ammounced but betore the securities aie issued, the securities 
Arc lei im die whenassace market, wil seulemen to ecc when the secantes are issued, 
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Figure 10.2. Cash Flows in a Forward Transaction 


Moving to logs for simplicity, the aperiod-ahead log forward rate is 


ha = "m d T 1.7 
= (n+ 1) Yu = пу 
= Ун + n4. BH yu) 


= Уш + (n+ 107. 55 Ум). (10.1.8) 


Equation (10.1.8) shows that the forward rate is positive whenever discount 
bond prices fall with maturity. Also, the forward rate is above both the z- 
period and the (n + 1)-period discount bond yields when the (1 + 1)-period 
yield is above the »-period yield, that is, whep the yield curve is upward- 
sloping.” This relation between a yield to maturity and the instantancous 
forward rate at that maturity is analogous to the relation between marginal 
and average cost. The yield to maturity is the average cost of borrowing for 
n periods, while the forward rate is the marginal cost of extending the time 
period of the loan. 

Figure 10.1 illustrates the relation between the forward-rate curve (shown 
as a dashed fine) and the yield curve (a solid line). The forward-rate curve 
lies above the yield curve when the yield curve is upward-sloping, and below 
it when the yield curve is downwardsloping. The two curves cross when 
the yield curve is Hat. These are the standard properties of marginal and 
average cost curves, When the cost of a marginal unit exceeds the cost of 
an average unit then the average cost increases with the addition of the 


TAs the Gane unit Мио relative te the bond maturity a, the formake (10.1.8) approaches 
far = Ym dey Bn the period yield plis 2 times the Море of the yield curve at maturity н, 
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marginal unit, so the average cost rises when the marginal cost is above the- 
average cost. Conversely, the average cost falls when the marginal e cost is 
below the average cost. 


10.1.2 Coupon Bonds 


As we have already emphasized, a coupon bond can be viewed as a package 
of discount bonds, one with face value equal to the coupon for each date at 
which a coupon is paid, and one with the saine face value and maturity as 
the coupon bond itself. Figure 10.3 gives a time line to illustrate the time 
pattern of payments on a coupon bond. 

The price of a coupon bond depends not only on its maturity n and the 
date t, but also on its coupon rate. To keep notation as simple as possible, 
we define a period as the time interval between coupon payments and C as 
the coupon rate per period. In the case of US Treasury bonds a period i is six 
months, and C is one half the conventionally quoted annual coupon rate. 
We write the price of a coupon bond as P to show its dependence on the 
coupon rate. 

The per-period yield to maturity on a coupon bond, Vent, is defintd as 


that discount rate which equates the present value of the bond’s payments 
to its price, so we have 


C x C 9 1+С 
(1 + Yent) (1 + Vent)? (1 + Yent)” 


In the case of US Treasury bonds, where a period is six months, ў is the 
six-month yield and the annual yield is conventionally quoted as twice Y. 

Equation (10.1.9) cannot be inverted to get an analytical solution for 
Усы. Instead it must be solved numerically, but the procedure is straightfor- 
ward since all future payments are positive so there is a unique positive real 
solution for Y;,,.5 Unlike the yield to maturity on a discount bond, the yield 
to maturity on a coupon bond does not necessarily equal the per-period 
return if the bond is held to maturity. That return is not even defined until 
one specifies the reinvestment strategy for coupons received prior to matu- 
rity. The yield to maturity equals the per-period return on the coupon bond 
held to maturity only if coupons are reinvested at a rate equal to'the yield 
to maturity. 

The implicit yield formula (10.1.9) simplifies in two important special 
cases. First, when Pon = l, the bond is said to be selling at par. In this case 
the yield just equals the coupon rate: Yeon = C. Second, when maturity n 


Bios (10.1.9) 


With negative future payments, there can be multiple positive real solutions to (10.1.9). 
In the analysis of investment projects, the discount rate that equates the present value of a 
project to its cost is known as the infernal rate of return. When projects have some negative cash 


flows in the future, there can be multiple solutions for the internal rate of return. 


(1) Maturity 1 2 n— 1 
(2) Face value C C C 
(3) Present С С С 
value discounted ———— ~s — 
at Y. (1 T Yent) (1 + Vent)? (1 T Y,)"7! 
ent 
C 2C — 
(1) х (3) Ru ша, мышы 08 | CORO. 
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Figure 10.3. Calculation of Duration for a Coupon Bond 
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is infinite, the bond is called a consol or perpetuity. In this case the yield just 
equals the ratio of the bond price to the coupon rate: Ye = (/ Pro. 


Duration and Immunization 

For discount bonds, maturity measures the length of time that a bondholder 
has invested money. But for coupon bonds, maturity is an imperfect mea- 
sure of this length of time because much of a coupon bond's value comes 
from payments that ace made before maturity. Macaulay's duration, duc to 
Macaulay (1938), is intended to be a better measure; like maturity, its units 
are time periods. To understand Macaulay's duration, think of a coupon 
bond as a package of discount bonds, Macaulay's duration is a weighted av- 
erage of the maturities of the underlying discount bonds, where the weight 
on cach maturity is the present value of the corresponding discount bond 
calculated using the coupon bond's yield as the discount rate: 


С; ‹ E 440 
L +2 trae bot are 
Dini TT "n 
(ut 
CY ل‎ 4x — 
i=l (I; 179%“ 
= v S — (10.1.10) 
cut 


The maturity of the first component discount bond is one period and this 
receives a weight of C/(1 + Yeu), the present value of this bond when Yous is 
the discount rate; the maturity of the second discount bond is two and this 
receives a weight of C/(1 + Yen)”; and so on until the last discount bond 
of maturity л gets a weight of (V 4- ОА You)". To convert this into an 
average, we divide by the sum of the weights C/(1 1 + C/U + Yeu)? + 
e 4 (0 + C)/(1 + Yon)", which from (10.1.9) is just the bond price P. 
These calculations arc illustrated graphically in Figure 10.3. 

When C = 0, the bond is a discount bond and Macaulay's duration 
equals maturity. When C > 0, Macaulay's duration is less than maturity and 
it declines with the coupon rate. For a given coupon rate, duration declines 
with the bond yield because a higher yield reduces the weight on more 
distant payments in the average (10.1.10). The duration formula simplifies 
when a coupon bond is selling at par or has an infinite maturity. A par 
bond has price Pay = 1 and yield % = C, so duration becomes D, = 
(1—(14-Y,) 0/01 — (1 + Vt) 5). A consol bond with infinite maturity 
has yield Yo, = C/ Pio; so duration becomes Dui = CL ot) / ош. 

Numerical examples that illustrate these properties are given in Ta- 
ble 10.1. The table shows Macaulay’s duration (and modified duration, de- 
fined in (10.1.12) below, in parentheses) for bonds with yields and coupon 


“Macaulay also suggests that one could use yields on discount bonds rather than the yield 
on the coupon bond to calculate the present value of each coupon payment. However this 
approach requires that one measure a complete zero-coupon term su ucture. 
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Table 10.1. Macaulay's and molified duration for selected bonds. 


Maturity (years) 
Coupon rate 0%. 


Yield 0% 1.000 2.000 5.000 10.000 30.000 
(1.000) (2.000) (5.000) (10.000). (30.000) 


n 1.000 2.000 5.000 10.000 30.000 — 
(0.976) (1.951) (4.878) (9.756) (20.268) 

10% 1.000 2.000 5.000 10.000 30.000 — 
(0.052) (1.9005) (4.762) — (9.594) (28.571) 


Coupon rate hi 
Yield 0. 0.988 1.932 4.550 8.417 21.150 
40.988) (1.932) (4.550) — (8.417) (21.150) 
5% 0.088 1.928 4.485 7.989 15.841 20.500 
(0.064) (l. 81) (4.376) (7.705) (15.454) (20.000) 
10% 0.988 1.924 4.414 7.489 10.957 10.500 
(0.940) (4.832) (4.204) (7.132) (10.436) (10.000) 


Coupon rate 10% 


Yield 0% 0.077 1.875 4.250 7.625 18.938 
(0.977) (1.875) (4.250) (7.625) (18.038) 


^ 0.977 1.808 4.156 7.107 14.025 20.500 
(0.953) (4.823) (4.05) (6.933) (13.683) (20.000) 
10% 0.976 1.862 4.054 6.543 9,938 — 10.500 
(0.030) (1.773) (3.861) (6.231) (9.465) (10.000) 


The table reports Macailay's duration and, ii parcatheses, modified duration for bonds with 
selected yields and maturities, Duration, vield, and maturity are stated in annual units but the 
маі уму calculations assume that bond payments are made at six-month intervals. 


rates of 055, 5%, and 1075, and matin ities ranging from one year to infinity. 
Duration is piven in years but is calculated using six-month periods as would 
be appropriate for US Treasury bonds. 

IH we take the derivative of (101.9) with respect to Yan or equivalently 
with respect to (У, о). we find that Macaulay's duration has another very 
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important property. It is the negative of the elasticity of a coupon bond's 
price with respect to its gross yield (1 + Y): 


— d Pens (1 T Yene) 
B dl E Vent) Pout ` 


ln industry applications, Macaulay's duration is often divided by the 
gross yield (1 + Yen) to get what is called modified duration: 


Dent (10.1.11) 


Diu l Vent 1 


———— = — Я 10.1.12 
$ (1 t Vent) Heut Pent } 


Modified duration measures the proportional sensitivity of a bond's price 
to a small absolute change in its yield. Thus if modified duration is 10, an 
increase in the yield of 1 basis point (say from 3.00% to 3.01%) will cause a 
10 basis point or 0.10% drop in the bond price." i 

Macaulay's duration and modified duration are sometimes used to an- 
swer the following question: What single coupon bond best approximates 
the return on a zero-coupon bond with a given maturity? This question is 
of practical interest because many financial intermediaries have long-term 
zero-coupon liabilities, such as pension obligations, and they may wish to 
match or immunize these liabilities with coupon-bearing Treasury bonds.“ 
Although today stripped zero-coupon Treasury bonds are available, they 
may be unattractive because of tax clientele and liquidity effects, so the im- 
munization problem remains relevant. If there is a parallel shift in the yield 
curve so that bond yields of all maturities move by the same amount, then 
a change in the zero-coupon yield is accompanied by an equal change in 
the coupon bond yield. In this case equation (10.1.11) shows that a WP ed 
bond whose Macaulay duration equals the maturity of the zero-coupon li- 
ability (equivalently, a coupon bond whose modified duration equals the 
modified duration of the zero-coupon liability) has, to a first-order approx- 
imation, the same return as the zero-coupon liability. This bond—or any 
portfolio of bonds with the same duration—solves the immunization prob- 
Jem for small, parallel shifts in the term structure. 

Although this approach is attractively simple, there are several reasons 
why it must be used with caution. First, it assumes that yields of all maturi- 
ties move by the same amount, in a parallel shift of the term structure. We 


"The elasticity of a variable B with respect to a variable A is defined to be the derivative of 
B with respect to A, тех A/B: (4B/ 4A)(A/ B). Equivalently, it is the derivative of log( B) with 
respect to log(A). 

Nute that if duration is measured in six-month time units, then yields should be measured 
ona six-month basis. One can convert to an annual basis by halving duration and doubling 
yields, The numbers in Table 10.1 have been annualized in this way. 

munition was originally defined by Reddington (1952) as “the investment of the 
assets in such a way that the existing business is immune to a general change in the rate of 
interest". Fabozzi and Fabozzi (1995), Chapter 42, gives a comprehensive discussion. 
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show in Section 10.2.1 that historically, movements in short-term interest 
rates have tended to be larger than movements in longer-term bond yields. 
Some modified approaches have been developed to handle the more real- 
istic case where short yields move more than long yields, so that there arc 
nonparallel shifts in the term structure (see Bicrwag, Kaufman, and Тосуѕ 
(1983], Granito [1984], Ingersoll, Skelton, and Weil [1978]). 

Second, (10.1.11) and (10.1.12) give first-order derivatives so they apply 
only to infinitesimally small changes in yields. Figure 10.4 illustrates the 
fact that the relationship between the log price and the yield on a bond is 
convex rather than linear. The slope of this relationship, modified duration, 
increases as yields fall (a fact shown also in Table 10.1). This may be taken 
into account by using a second-order derivative. The convexity of a bond is 
defined as 


y n nil) ninti) 
; Pm 1 Cl gerat qax ‘ 
Convexi = = — =, (10.1.13) 
ү? › > 
д Vout ent Pot 


gnd convexity can be used in a second-order Taylor series approximation of 
s price impact of a change in yield: 


d (Vn) dP,, 1 1 Р, 1 р 
x c ES —— — dY, L—— — (Abe)? 
Pen ЧҮ Pon 5 2 d ү, P. Ө i 


\ 

t 

! 

| = (~ modified duration) dY,, 
1 


1 , 
+ 2 convexity (d V.,)? (10.1.14) 


Finally, both Macaulay's duration and modified duration assume that 
cash flows are fixed and do not change when interest rates change. This 
assumption is appropriate for Treasury securities but not for callable securi- 
ties such as corporate bonds or mortgage-backed securities, or for securities 
th default risk if the probability of default varies with the level of interest 
rates, By modelling the way in which cash flows vary with interest rates, it is 
possible to calculate the sensitivity of prices to interest rates for these more 
шл, securities; this sensitivity is known as effective duration"? 


A'Loglinear Model for Coupon Bonds 

The idea of duration has also been used in the academic literature to find 
approximate linear relationships between log coupon bond yields, holding- 
period returns, and forward rates that are analogous to the exact relation- 
ships for zero-coupon bonds. To understand this approach, start from thc 


"See Fabozzi and Fabozzi (1995), Chapters 28-30, and Fabozzi (1996) for a discussion of 
various methods used by fixed-income analysts to calculate effective duration. 
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Log Price 


Slope = Modified Duration 


Yield 


Figure 10.4. The Price-Yield Relation ship 


loglincar approximate return formula (7.1.19) derived in Chapter 7, and 
apply it to the one-period return rusi 0n an period coupon bond: 


77 . / Ж kd Ppea-iaiaa F Q -= p) = pem- (10.1.15) 


Here the log nominal coupon c plays the role of the dividend on stock, 
but of course it is fixed rather than random. The parameters p and k are 
given by p = 1/(1 + exp(c = p)) and k = ~ logio) -- (1 — p) log(1/p = 1). 
When the bond is selling at par, then its price is $1 so its log price is zero 
and p = 1/(14- C) = (1 + Fa). It is standard to use this value for p, 
which gives a good approximation for returns on bonds selling close to 
par. 

One can treat (10.1.15), like the analogous zero-coupon expression 
(10.1.5), as a difference equation in the log bond price. Solving forward to 
the maturity date one obtains 


п- 1 


Pent = Yeti t( —р)с— 77. 1 eters (10. 1.16) 


i=0 


This equation relates the price of a coupon bond to its stream of coupon pay- 
ments and the future returns on the bond. A similar approximation of the 
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log yield to maturity y, ;, shows that it satisfies an equation of the same form: 


ant 
Pout A So pith + (1 = py — уш) 
=0 
(1 — p") 
- EU LEO руе уы]. (10.1.17) 


Equations (10.1.16) and (10.1.17) together imply that the period coupon 
bond yield satisfies yas 25 (A — pa - p")) X р! алала Thus al- 
though there is no exact relationship there is an approximate equality be- 
tween the log yield to maturity on a coupon bond and a weighted average 
of the returns on the bond when it is held to maturity. 

Equation (10.1.11) tells us that Macaulay's duration fora coupon bond is 
the derivative of its log price with respect to its log yield. Equation (10.1.17) 
gives this derivative as 


(ap) b= (ht 07" 


D ` = 7 ` 
(1-p) 1% 


(10.1.18) 


where the second equality uses p = (14 5% , Asnotedabove, this relation 

between duration and yield holds exactly for a bond selling at par. 
Substituting (10.1.17) and (10.1.18) into (10.1.15), we obtain a loglin- 

car velation between holding-period returns and yields for coupon bonds: 


Knari R Di Yon = (Diy — 1) уа (10.1.19) 


This equation was first derived. by Shiller, Campbell, and Schoenholtz 
(1983),! h is analogous to (10.1.5) for zero-coupon bonds; maturity in 
that equation is replaced by duration here, and of course the two equations 
are consistent with one another for a zero-coupon bond whose duration 
equals Us maturity. 

A similar analysis for forward rates shows that the a-period-ahead 1- 
period forward rate implicit in the conpon-bearing term structure is 


D, i Non ES 10 Ment 
Dai 7 D, 


E 


DEN (10.1.20) 
This Formula, which is also due to Shiller, Campbell, and Schoenboltz 


(1983), reduces to the discount bond formula (10.1.8) when duration equals 
maturity. 


ДЕ ЕИ: Амар, and М осно ase . „ instead of x. yj, but these are equivalent to the 
same Historder approximation used 10 derive ONLI). They abo derive foimulas relating 


, multipeciod holding returns to vields. 
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10.1.3 Estimating the Zero-Coupon Term Structure 


The classic immunization problem is that of finding a coupon bond or 
portfolio of coupon bonds whose return has the same sensitivity to small 
interest-rate movements as the return on a given zero-coupon bond. Alter- 
natively, one can try to find a portfolio of coupon bonds whose cash flows 
exactly match those of a given zero-coupon bond. In general, this portfolio 
will involve shortselling some bonds. This procedure has academic interest 
as well; one can extract an implied zero-coupon term structure from the 
coupon term structure. 

If the complete zero-coupon term structure—that is, the prices of dis- 
count bonds Ру... P, maturing at each coupon date—is known, then it is 
easy to find the price of a coupon bond as 


Pe = PACH PC + ++ Р„(1 + С). (10.1.21) 


Time subscripts are omitted here and throughout this section to economize 
on notation. | 
Similarly, if a complete coupon term structure—the prices of coupon 
bonds Ра... Pen maturing at each coupon date—is available, then (10.1.21) 
can be used to back out the implied zero-coupon term structure. Starting 
with a oue-period coupon bond, Pa = Р (1 + C) so Py = Py/(l + C). We 
can then proceed iteratively. Given discount bond prices Pi. ..., Pa-1, we 
can find P, as 
Pon — Py ~- РС 
1+C 


Sometimes the coupon term structure may be more-than-complete in 
the sense that at least one coupon bond matures on each coupon date 
and several coupon bonds mature on some coupon dates. In this case 
(10.1.21) restricts the prices of some coupon bonds to be exact functions 
of the prices of other coupon bonds. Such restrictions are unlikely to hold 
in practice because of tax effects and other market frictions. To handle this 
Carleton and Cooper (1976) suggest adding a bond-specific error term to 
(10.1.21) and estimating it as a cross-sectional regression with all the bonds 
outstanding at a particular date. If these bonds are indexed i = 1...7, then 
the regression is 


Pa = (10.1.22) 


Poa = N Ci + PEG +۰.۰ + Py Ci) + Ui, i = 1...1, (10.1.23) 


where C, is the coupon on the ith bond and л; is the maturity of the ith bond. 
The regressors are coupon payments at different dates, and the coefficients 
are the discount bond prices Pj, j = 1... N, where N = max; njisthelongest 
coupon bond maturity. The system can be estimated by OLS provided that 
the coupon term structure is complete and that J > N. І 
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Spline Estimation 

In practice the term structure of coupon bonds is usually incomplete, and 
this means that the coefficients in (10.1.23) are not identified without im- 
posing further restrictions. It seems natural to impose that the prices of dis- 
count bonds should vary smoothly with maturity. McCulloch (1971, 1975) 
suggests-that a convenient way to do this is to write Pa, regarded as a function 
of maturity P(n), as a linear combination of certain prespecified functions: 


Р, = Pin) = Ho len : (10.1.24) 
| Fl 


McCulloch calls P(n) the discount function. The fi(n) in (10.1.24) arc known 
functions of maturity п, and the a, are coefficients to be estimated, Since 
PO) = = 1, we must have f,(0) = 0 for all j. 

| Substituting (10.1.24) into (10.1.23) and rearranging, we obtain a re- 
Reston equation 


1 
| П; = Vig Хи i = 1 . . (10.1.25) 
| Ј=і 


where П, = Pan — 1 Cin, the difference beween the coupon bond price 
a d the undiscounted valuc of its future payments, and X, = fon) + 
Gi 121 fi). Like equation (10.1.23), this equation can be estimated by 
dis, but there are now only J coefficients rather than N.“ 

| А key question is how to specify the functions fi(n) in (10.1.24). One 
simple possibility is to make P(x), the discount function, a polynomial. To do 
this one sets f(n) = n. Although a sufficiently high-order polynomial can 
approximate any function, in practice one may want to use more parameters 
to fit the discount function at some maturities rather than others. For 
example one may want a тоге flexible approximation in maturity ranges 
where many bonds arc traded. 

To meet this need McCulloch suggests that P(n) stiould be a spline 
function. An rth-order spline, defined over some finite interval, is a piece- 
wise rth-order polynomial with r— 1 continuous derivatives; its rth derivative 
is a step function. The points where the rth derivative changes discontin- 
uously (including the points at the beginning and end of the interval over 
which the spline is defined) are known as knot points. If there are K knot 


The bond pricing errors are unlikely to be homoskedastic, McCulloch argues that the 
standard deviation of u, is proportional to the bid-ask spread for bond i and thus weights 
cach observation by the reciprocal of its spread. This is not required for consistency, but may 
improve the efficiency of the estimates. 

1f Suits, Mason, and Chan (1978) give an accessible introduction to spline methodology. 
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points, there are К — 1 subintervals in cach of which the spline is a polyno- 
mial The spline has K — 2 + r free parameters, r for the first subiuterval 
and 1 (that determines the unrestricted rth derivative) for each of the K —2 
following subintervals. McCulloch suggests that the knot points should be 
chosen so that cach subinterval contains an equal number of bond maturity 
dates, 

IF Torward rates are to be continuous, the discount function must have at 
least one continuous derivative. Hence a quadratic spline, estimated by Mc- 
Culloch (1971), is the lowest-order spline that can fit the discount function. 
If we require that the forward-rate curve should also be continuously differ- 
enuable, then we need to use a cubic spline, estimated by McCulloch (1975) 
and others. McCulloch's papers give the rather complicated formulas for 
the functions JOD that make P(n) a quadratic or cubic spline. 


Tax Effects 

OLS estimation of (10.1.25) chooses the parameters a, so that the bond pric- 
ing errors u, are uncorrelated with the variables X, that define the discount 
function, If a sufficiently flexible spline is used, then the pricing errors will 
be uncorrelated with maturity or any nonlinear function of maturity, Pricing 
errors may, however, be correlated with the coupon rate which is the other 
defining characteristic of a bond. Indeed McCulloch (1971) found that his 
model tended to underprice bonds that were selling below par because of 
their low coupon rates. 

McCulloch (1975) attributes this to a tax effect US ‘Treasury bond 
coupons are taxed as ordinary income while price appreciation on a coupon- 
bearing bond purchased ata discount is taxed as capital gains. If the capital 
gains tax rate r, is less than the ordinary income tax rate т (as has often 
been the case historically), then this can explain a price premium on bonds 
selling below par. For an investor who holds a bond to maturity the pricing 
formula (10.1.21) should be modified to 


Pa = Mm rg = OC у P. (10.1.26) 


vt 


The spline approach can be modified to handle tax effects like that in 

(10.1.26), at the cost of some additional complexity in estimation, Once 

tax effects are included, coupon bond prices must be used to construct the 

variables X, on the right-hand side of (10.1.25). This means that the bond 

тісше errors are correlated with the regressors so the equation must be 
5 5 | 


Adams and van Deventer (1994) argue for the use of a ute spline, with the 
cubic term omitted, in order to maximize the “smoothness” of the forward-rate curve, where 
smoothness is defined to be minus the average squared second derivative ol the forward-rate 
curve with respect to maturity. 
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estimated by instrumental variables rather than simple OLS. Litzenberger 
and Rolfo (1984) apply a tax-adjusted spline model of this sort to bond 
market data from several different countries. 

The tax-adjusted spline model assumes that the same tax rates are rel- 
evant for all bonds. The model cannot handle "clientele" effects, in which 
differently taxed investors specialize in different bonds. Schaefer (1981, 
1082) suggests that clientele effects can be handled by first finding a set of 
tax-efficient bonds for an investor ina particular tax bracket, then estimating 
an implied zero-coupon vield curve from those bonds alone. 


Nonlinear Models 

Despite the flexibility of the spline approach, spline functions have some 
unappealing properties. First, since splines are polynomials they implv a 
discount function which diverges as maturity increases rather than going 
to zero as required by theory. Implied forward rates also diverge rather 
than converging to any fixed limit. Second, there is no simple way to ensure 
that the discount function always declines with maturity (i.e. that all forward 
rates ate positive). "Phe forward curve illustrated in Figure 10.1 goes negative 
ata maturity of 27 years, and this behavior is not uncommon (see Shea 
[1984]. These problems are related to the fact that a flat zero-coupon 
yield curve implies an exponentially declining discount function, which is 
not easily approximated by a polynomial function. Since any plausible yield 
curve flattens out at the long end, splines are likely to have difficulties with 
longeramaturity bonds, 

These difficulties have led some authors to suggest nonlinear alterna- 
lives to the linear specification (10.1.24). One alternative, suggested by 
Vasicek and Fong (1982), is to use an exponential spline, a spline applied to 
a negative exponential transformation of maturity. The exponential spline 
has the desirable property that forward rates and zero-coupon yields con- 
verge 10 a fixed limit as maturity increases. More generally, a flat yield curve 
is easy to Bit with an exponential spline. 

Although the exponential spline is appealing in theory, it is not clear that 
it performs better than the stindard spline in practice (see Shea (1985]). 
The exponential spline does not make it easier to restrict forward rates to 
be positive. As for its lonp-maturity behavior, it is important to remember 
that forward rates cannot be directly estimated beyond the maturity of the 
longest coupon bond; they can only be identified by restricting the relation 
between loug-harizon forward cautes aud shorter horizon forward rates. The 
exponential spline, like the standard spline, fits the observed maturity range 
flexibly, leaving the limiting forward rate and speed of convergence to this 
rate to be determined more by the restrictions of the spline than by any 
characteristics of the long-horizon data Since the exponential spline in- 
volves nonlinear estimation ofa parameter used to transform maturity, it is 
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more difficult to use than the standard spline and this cost may outweigh the 


exponential spline's desirable long-horizon properties. In any case, forward Ny 


rate and yield curves should be treated with caution if they are extrapolated 
beyond the maturity of the longest traded bond. | 
Some other authors have solved the problem of negative forward:rates 
by restricting the shape of the zero-coupon yield curve. Nelson and Siegel 
(1987), for example, model the instantaneous forward rate at maturity n 
as the solution to a second-order differential equation with equal foots: 


f(r) = Ва + В; ехр(-оп) + an By exp(—an). This implies that the discount 
function is double-exponential: 


Pin) = exp[—fo n+ (£i + BDO - exp(—an))/a — n By exp n)). 


This specification generates forward-rate and yield curves with a desigable 
range of shapes, including upward-sloping, inverted, and hump-shaped. 
Svensson (1994) has developed this specification further. Other recent work 
has generated bond-price formulas from fully specified general-equilibrium 
models of the term structure, which we discuss in Chapter 11. 


10.2 Interpreting the Term Structure of Interest Rates 


There is a large empirical literature which tests statements about expected- 
return relationships among bonds without deriving these statements from 
a fully specified equilibrium model. For simplicity we discuss this literature 
assuming that zero-coupon bond prices are observed or can be estimated 
from coupon bond prices. 


10.2.1 The Expectations Hypothesis 


The most popular simple model of the term structure is known as the expec- 
lations hypothesis. We distinguish the pure expectations hypothesis (PEH) (PEH), 
which says that expected excess returns on long-term over short-term bonds 
are zero, from the expectations hypothesis (EH), which says that expected ex- 
cess returns are constant over time. This terminology is due to Lutz (1940). 


Different Forms of the Pure Expectations Hypothesis 

We also distinguish different forms of the PEH, according to the time hori- 
zon over which expected excess returns are zero. A first form of the PEH 
equates the one-period expected returns on one-period and n-period bonds. 
The one-period return on a one-period bond is known in advance to be 
(1 + Yi), so this form of ће PEH implies 


(Yi) = BIAR a] = + Yad" E[O- DJ. (10.2.1) 


414 
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м the second equality follows from the definition of holding-period 
return and the fact that (1 + Y,,) is known at time £. 

ЈА second form of the РЕН equates the period expected returns on 
ond-period and n-period bonds: 

i 

| + HD" = E[ü + YOO + Ye + Ya]. 10.2.2) 


Here (1 + Уш)” is the n-period return on an x-period bond, which equals 
thejexpected return from rolling over onc-period bonds for n periods. It is 
straightforward to show that if (10.2.2) holds for all n, it implies 


Op Y," 


Da m LM 
n (1 + Y err ary tae 


= ELL Fi ui. (10.2.3) 
Under this form of the PEH, the (2 — 1)-period-ahead one-period forward 
ratd equals the expected (n = 1)-period-ahead spot rate. 

Itu is also straightforward to show that if (10.2.2) holds for all п, it implies 

аж)" = A+ MOE [У]. (10.2.4) 
But (10.2.4) is inconsistent with (10.2.1) whenever interest rates are random. 
The problem is that by Jensen's Inequality, the expectation of the reciprocal 
ofa random variable is not the reciprocal of the expectation of that random 


variable. Thus the pure expectations hypothesis cannot hold in both its 
one-period form and its п-регіо4 form.'* 

One can understand this problem more clearly by assuming that interest 
rates are lognormal and homoskedastic and taking logs of the one-period 
РЕН equation (10.2.1) and the period PEH equation (10.2.4). Noting 


that from equation (10.1.5) the excess one-period log return on an period 
bond is 


тажа уи = Ou — Xi) — (n — 1) 0-11 7 Ya. (10.2.5) 
equation (10.2.1) implies that 
LI. II = yu] = —(1/2) Varroni — . ° (10.2.6) 
while (10.2.4) implies that 
Elne — yu] = (1/2) Vari- yu]. (10.2.7) 


The difference between the right-hand sides of (10.2.6) and (10.2.7) is 


Cox, Ingersoll, and Ross (198 la) make this point very clearly. They also argue that in 
continuous time, only expected equality of instantaneous returns (а model corresponding to 
(10.2.1)) is consistent with the absence of arbitrage. But McCulloch (1993) has shown that 
this result depends on restrictive assumptions and does not bold in general. 


10.2. Interpreting the ‘Term Structure of Interest Kates 415 


Table 10.2. Means and standard deviatious of term structure variables, 


; Long band maturity (n) 
Variable iu PE key 


2 3 6 E 24 T 120 
Excess return 0.385 0.564 60.8.8 0.017 0.700 ПАПЕ! 0.048 
Mail Ун (OED) (1.222) (2.954) (6.218) (11.33) (19.0) (37.08) 
Change in yield 0.010 0.010 0.010 "0.010 0.011 0.011 6.012 
Yaad Ун (0,5512) (0.576) (0,570) — «G5 7) (0. 188) N (0.510) 
Change in yield -. 188 —-0.119.— —0.056 —0.014 0.011 „011 0.012 
Yue batt Уне (O.G08) (0.586) (0.573) (0.555) (0.488) (0110) (0.310) 
Yield spread 0.197 0.326 0.570 0.765 0.958 1.153 1.367 
Ум Me (0.212) (0.303) (0.438) (0.594) (0.797) (1.012) (1.237) 


Long bond maturities are measured in months. For each variable the table reports the sample 
mean and sample standard deviation (in parentheses) using monthly data over the period 
1057: J- %: z. The units are annualized percentage points. The underlying data are zero- 
coupon bond yields from McCulloch and Kwon (19903). 


Var Dpi = yu], which measures the quantitative importance of the Jensen's 
Inequality effect in a lognormal homoskedastic model. 

Table 10.2 reports unconditional sample means and standard deviations 
for several terin-structure variables over the period 1952:1 to 1991:2." All 
data are monthly, but are measured in annualized percentage points; that 
is, the raw variables are multiplied by 1200. Fhe first row shows the mean 
and standard deviation of excess returns on remount zero-coupon bonds 
over one-month bills. The mean excess return is positive and rising with 
maturity at first, but it starts to fall at à maturity of one year and is even 
slightly negative for ten-year zero-coupon bonds. 

"This pattern can be understood by breaking excess returns into the two 
components on the righthand side of equation (10.2.5): the yield spread 
(уы — уш) between period and one-period bonds, and - (r = 1) times the 
change in yield (улы — Уш) ON the »-period bond. Interest vates of all 
fixed maturities rise during the sample period, as shown in the second row 
of Table 10.2 and illustrated for one-month and ten-year rates in Figure 10.5 
At the short end of the term structure this effect is offset by the decline in 
maturity from à to a= Las the bond is held for one month; thus the change 


able 12 is an expanded version of a table shown in Campbell (1995), The numbers 
given here are slightly different from the numbers in that paper because the sample period 
used in that paper was 1051:1 to 1990:2, although it was erroneously reported to be 1952: to 
1991:32, 
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Figure 10. 5. Shait- und Long-Term Interest Rates 1952 to 1991 


in yield (y, 1,54 = У), Shown in the third row of Table 10.2, is negative for 
short bonds, contributing positively to their return. At the long end of the 
term structure, however, the decline in maturity from n to n= 1 is negligible, 
and so the change in yield (у-ва Yar) is positive, causing capital losses on 
long zero-coupon bonds which outweigh the higher yields offered by these 
bonds, shown in the fourth row of Table 10.2. 

The standard deviation of excess returns rises rapidly with maturity. If 
excess bond returns are white noise, then the standard error of the sample 
mean is the standard deviation divided by the square root of the sample size 


(469 months). The standard error for a = 2 is only 0.03%, whereas the 
standard error tov n = 20 is 1.7176. Thus the pattern of mean returns is 


imprecisely estimated at long maturities. 

The standard deviation of excess returns also determines the size of the 
wedge between the one-period and -period forms of the pure expectations 
hypothesis. The difference between mean annualized excess returns under 
(40.2.6) and (10.2.7) is only 0.00075 for n = 2. Tt is still only 0,11% for 


З туем who seek to profit tiom this teadenev of bond vields to ТАП as maturity shrinks 
ave said to be “riding the vield curve" 
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п = 24. But it rises to 1.15% for n = 120. This calculation shows that 
the differences between different forms of the PEH are small except for 
very long-maturity zero-coupon bonds. Since these bonds have the most 
imprecisely estimated mean returns, the data reject all forms of the PEH 
at the short end of the term structure, but reject no forms of the PEH at 
the long end of the term structure. In this sense the distinction between 
different forms of the PEH is not critical for evaluating this hypothesis. 

Most empirical research uses neither the one-period form of the PEH 
(10.2.6), nor the n-period form (10.2.7), but a log form of the PEH that 
equates the expected log returns on bonds of all maturities: 


Efra ~ уи] = 0. (10.2.8) 


This model is halfway between equations (10.2.6) and (10.2.7) and can be 
justified as an approximation to either of them when variance terms are 
small. Alternatively, it can be derived directly as in McCulloch (1993). 


Implications of the Log Pure Expectations Hypothesis 

Once the PEH is formulated in logs, it is comparatively easy to state its 
implications for longer-term bonds. The log PEH implies, first, that the 
one-period log yield (which is the same as the one-period return on a one- 
period bond) should equal the expected log holding return on a longer 
n-period bond held for one period: | 


yu = Ernel (10.2.9) 
Second, a long-term n: period log yield should equal the expected sumiof 
" successive log yields on one-period bonds which are rolled over fot n 
periods: ! 


n=l . 
эм = (/ 3 Еу]. (10.2.10) 


10 


Finally, the (n — I) - period- ahead one- period log forward rate should equal 
the expected one-period spot rate (n — 1) periods ahead: 


fia = Ё[уш+а-1]. (10.2.11) 


This implies that the log forward rate for a one-period invesunent to be 
made at a particular date in the future should follow a martingale: i 


fa = Eae = Е [Ee i] = Filfil. (10242) 


If any of equations (10.2.9), (10.2.10), and (10.2.11) hold for all n and |, 
then the other equations also hold for all » and f. Also, if any of these 
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equations hold for n = 2 at some date 4, then the other equations also 
hold lor n = 2 and the same date 1. Note however that (10.2.9)-(10,9.1 1) 
ane not generally equivalent for particular n and !. 


Afternatives to the Pure Expectations { fypothesis 
The expectations hypothesis (EH) is more general than the PELL in that 
it [allows the expected returns on bonds of different maturities to diffe 
f €qnstants, which can depend on maturity but not on time, The differences 
0 bétween expected returns on bonds of different maturities are sometimes 
called ferm premia, The PEH says that term premia are zero, while the EH 
| Says that they are constant through time.?! Like the PEH, the EH can be 
: formulated for One-period simple returns, for n-period Simple returns, or 


for log returns. If bond returns are lognormal and homoskedastic, as in 


Singleton (1990), then these formulations are Consistent with one another 
because the Jensen's Inequality effects are constant over time. Recent em- 
: pirical research typically concentrates on the log form of the ЕН. 
à Early discussions of the term structure tended to ignore the possibility 
that term premia might vary over time, concentrating instead on their sign. 
Я Hicks (1946) and Lutz (1940) argued that lenders prefer short Maturities 
while borrowers prefer long maturities, so that long bonds should have 
higher average returns than short bonds. Modigliani and Sutch (1966) 
argued that different lenders and borrowers might have different preferred 
habitats, so that term premia might be negative as well as Positive, Al these 
| authors disputed the PEH but did not explicitly question the EH. More 
recent work has used intertemporal asset pricing theory to derive hoth the 


average sign and the ume-variation of term premia; we discuss this work in 
Chapier 11. 


r by 


| 10.2.2 Yield Spreads and J nterest Rate Forecasts 


We now consider empirical evidence on the expectations hypothesis (EH). 
Since the EH allows constant differences in the expected returns on short. 
and long-term bonds, it does not restrict constant terms so for convenie 
we drop constants from all equations in this section, 

So far we have Stated the implications of the expectations hypothesis 
for the levels of nominal interest rates. In post-World War JI US data, nom- 
inal interest rates cem to follow a highly persistent process with a root very 
close to unity, so much empirical work uses yield spreads instead of yield 
levyls. 2 

| ` 

? This usage is the most common one in the literature, Fama (1984), Fama (1990), and 
Fana and Bliss (1987), however, use "term pre 
iela returns on long-term bonds, 


+See Chapters 2 and 7 for a discussion at anit roots; The 


nce 


mia" to refer 10 realized, rather than expected, 


persistence of the shottrate 
i 

! 

| 
! 
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Recall that the yield spread between the period yield and the onc- 


period yield is ½% = 5 — yi. Equation (10.1.6) implies that 
1 . n 
Sy = E К, у, [o — xi + Onit atte vid] 
ist 
1 : n І 1 
= (+) E 2 [(n = DAY i, + Fuel unc 352] . (10.2.33) 


The second equality in equation (10.2.13) replaces multiperiod interest rate 
changes by sums of single-period interest rate changes. The equation says 
thai the yield spread equals a weighted average of expected future interest 
rate changes, plus an unweighted average of expected future excess returns 
on long bonds. If changes in interest rates are stationary (that is, if interest 
rates themselves have one unit root but not two), and if excess returns 
are stationary (as would be implied by any model in which risk aversion 
and bonds’ risk characteristics are stationary), then the yield spread is also 
stationary. This means that yields of different maturities arc cointegrated." 

The expectations hypothesis says that the second term on the right 
hand side of (10.2.13) is constant. This has important implications for the 
relation between the yield spread and future interestrates. [t means that the 
yicld spread is (up to a constant) the optimal forecaster of the change in the 
long-bond yield over the life of the short bond, and the optimal forecaster 
of changes in short rates over the life of the long bond. Recalling that we 
have dropped all constant terms, the relations are 


1 
( IE Lx F, I 4 = v. (10.2.14) 
n~] 
and 
n-]l 
Su = Ei my]: (10.2.15) 
i=l 


Equation (10.2.14) can be obtained by substituting the definition of , 
(10.1.5), into (10.2.9) and rearranging. It shows that when the yield spread 
is high, the long rate is expected to rise. 'This is because a high yield spread 
gives the long bond a yield advantage which must be offset by an anticipated 
capital loss, Such a capital loss can only come about through an increase 
in the long-bond yield. Equation (10.2.15) follows directly from (10.2.13) 
with constant expected excess returns. It shows that when the yield spread is 


process is discussed huther in Chapter 11. 
See Campbell and Shiller (1987) for a discussion of cointegration in the term SUCHE 
of interest rates. 
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Table 10.3. Rewession coefficients B. and Yu. 


Dependent Long bond maturity (п) 

variable 2 3 ü 12 21 AI 190 
Lonp-svteld 

changes 00 0.15 OURS? -1.435 -IAR -2.2062 — —4.226 


(10.2.16) (QT) (0.282) (0.442) (0.509) (1.004) (1.458) (2.076) 


Short rate 
changes 0.502 0.467 0,320 0,272 0.363 0.442 1.402 
(10.2.18) (0,006) (0.1 18) (0.146) (0.208) (0.223) (0.3584) (0.147) 


Long bond maturities ire measured in months. The first row reports the estimated regression 
coctficien Ё, fiom (10.2.16), with an asymptotic standard error (in parentheses) caleulated to 
allow foi heteroskedasticity in the manner described in the Appendix. The second row reports 
the the estimated regression coefficient Fu from (10.2.18), with an asymptotic standard error 
calculated in the sune manner, allowing also for residual autocorrelation, The expectations 


hypothesis of the fern мамеси implies that both А, and y, should equal one; The underlying 
data are monthly zero-coupon bond yields over the period 1952:1 to 199122, from McCulloch 
and Kwon (1993), 


high, short rates are expected to rise so that the average short rate over the 
life of the long bond equals the initial long-bond yield. Near-term increases 
in short vates are given greater weight than further-off increases, because 
they affect the level of short rates during a greater part of the life of the long 
bond. 


Yield Spreads and Future Long Rates 
Equation (10.2.14), which says that high yield spreads should forecast in- 
creases in long rates, fares poorly in the data. Macaulay (1938) first noted 
the fact that high yield spreads actually tend to precede decreases in long 
rates, He wrote; "Phe yields of bonds of the highest grade should fall during 
a period in which short-term rates are higher than the yields of the bonds 
and rise during a period in which short-term rates are lower. Now experience 
is more nearly the opposite" (Macaulay [1938, р. 33]). 

Table 10.3 reports estimates of the coefficient B, and its standard error 
iu the regression 


m . 
Yo 1. 1 ар = Oy + B. (=) ＋ (10.2.16) 


‘The maturity i varies from 3 months to 120 months (10 years). According 


ETIN ЧЕ E : 
“Teor maturities above one vear the table uses the approximation Ya-pat © Унет. Note 


10.2. Interpreting the 1erm Structure of Interest Rates 421 

| 
to the expectations hypothesis, we should find B, = 1. In fact alh the 
estimates in Table 10.3 are negative; all are significantly less than one, and 
some are significantly less than zero. When the long-short yield spread is 
high the long yield tends to fall, amplifying the yield differential between 
long and short bonds, rather than rising to offset the yield differential-as 
required by the expectations hypothesis. : 

The regression equation (10.2.16) contains the same information as a 
regression of the excess one-period return on an r-period bond onto the 
yield spread su. Equation (10.2.5) relating excess returns to yields implies 
that the excess-return regression would have a coefficient of (1 ~ BH). Thus 
the negative estimates of fa in Table 10.3 correspond to a strong positive 
rclationship between yield spreads and excess returns on long bonds. Thisis 
similar to the positive relationship between dividend yields and stock returns 
discussed in Chapter 7.” 

One difficulty with the regression (10.2.16) is that it is particularly sen- 
sitive to measurement error in the long-term interest rate (see Stambaugh 
[1988]). Since the long rate appears both in the regressor with a positive 
sign and in the dependent variable with a negative sign, measurement error 
would tend to produce the negative signs found in Table 10.3. Campbell 
and Shiller (1991) point out that this can be handled by using instrumen- 
tal variables regression where the instruments are correlated with the yield 
spread but not with the bond yield measurement error, They try a variety 


of instruments and find that the negative regression coefficients are quite 
robust. 


Yield Spreads and Future Short Rates 

There is much more truth in proposition (10.2.15), that high yield spreads 
should forecast long-term increases in short rates. This can be tested either 
directly or indirectly. The direct approach is to form the ex post value of the 
short-rate changes that appear on the right-hand side of (10.2.15) and to 
regress this on the yield spread. We define 


—1 
Su = У (1 = if nAyuse (10.2.17) 


| 
that this is not the same as approximating =I. i DY Pact. The numbers given differ slightly 
from those in Campbell (1995) because that paper uses the sample period 1951:1 to 1990:2, 
erroneously reported as 1952:1 to 1991:2. | 
P Campbell and Ammer (1993), Fama and French (1989), and Keim and Stambaugh (1956) 
show that yield spreads help to forecast excess returns on bonds as well as on other long-term 
assets, Campbell and Shiller (1991) and Shiller, Campbell, and Schoenholtz (1983) show that 
yield spreads tend to forecast declines in long-bond yields. | 
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and run the regression 
i 54 = Hact Yn Su EEn. (10.2.18) 
The expectations hypothesis implies that y, = 1 (or all u. 

Table 10.3 reports estimated у, coefficients with standard errors, cor- 
recting for heteroskedasticity and overlap in the equation errors in the man- 
ner discussed in the Appendix. The estimated coefficients have a U shape: 
For small n they are smaller than one but significantly positive; np to a 
year or so they decline with u, becoming insignificantly different from zero; 
beyond one year the coefficients increase and at ten years the coefficient 
is even significantly greater than one. Thus Table 10.3 shows that yield 
spreads have forecasting power for short-rate movements over a horizon of 
two or three months, and again over horizons of several years. Around one 
year, however, yield-spread variation scems almost unrelated to subsequent 
movements in short rates. 

The regression equation (10.2.18) contains the same information as a 
regression of (1/2) times the excess period return on an period bond 
onto the yield spread sm. The relation between excess returns and yields 
implies that the excess-return regression would have a coefficient of (1 — yn). 
Table 10.3 implics that yield spreads forecast excess returns out to horizons 
of several years, but the forecasting power diminishes towards ten years. 

There are several econometric difficulties with the direct approach just 
described. First, one loses » periods of data at the end of the sample period. 
"This can be quite serious: For example, the ten-year regression in Table 
10.3 ends in 1981, whereas the three-month regression ends in 1991. This 
makes a substantial difference to the results, as discussed by Campbell and 
Shiller (1991). Second, the error term én is a moving average of order 
(n — 1), so standard errors must be corrected in the manner described in 
the Appendix. This can lead to finite-sample problems when (n — 1) is not 
small relative to the sample size. Third, the regressor is serially correlated 
aml correlated with lags of the dependent variable, and this too can cause 
исар problems (sec Mankiw and Shapiro [1986], Richardson and 
Stock [1990], and Stambaugh [1986]). 

| Although these econometric problems are important, they do not seem 
to account for the U-shaped pattern of coefficients. Campbell and Shiller 
(1991) find similar results using a vector autoregressive (VAR) methodology 
like that described in Section 7.2.3 of Chapter 7. They find that the long- 
teim yield spread is highly correlated with an unrestricted VAR forecast of 
future shortrate movements, while the intermediateterm yield spread ts 
much more weakly correlated with the VAR forecast. 


ama (1984) and Shiller, Campbell, and Schocnholtz (1983) use this approach at the short 
end of the term structure, while Fama and Bliss (1987) extend it to the long end, Campbell 
antl Shiller (199)) provide a comprehensive review. 
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‘To interpret Table 10.3, it is helpful to return to equation (10.2.13) and 
rewrite il as 


Sm = Map E Mae (10.2.19) 
where 
1 
Mu = El = (5) E, Youn = NOW 
id 
and 


1 п 
Su = (2) E, ECT HiT N. 0 


[S 


In general the yield spread is the sum of two components, one that forecasts 
interest rate changes (зу) and one that forecasts excess returns on Jong 


bonds (аљ). This means that the regression coefficient y, in equation 
(10.2.18) is 


(oV, Suc] 
Var[5,;] 


Var [ syn} + Cov(syus St! 


зы Ыл . E ыл ДЕ. 10.2.20 
Маг} sy,] + Var[ srn] 2 Cov[ sy. 875] ( ) 


For any given variance of excess-return forecasts Vun as the variance 
of interest rate forecasts sy,, goes to zero the coefficient у, goes to zero, 
but as the variance of sy,; increases the coefficient y, goes to one, The U- 
shaped pattern of regression coefficients in Table 10.3 may be explained by 
reduced forecastability of interest rate movements at horizons around one 
year, There may be some short-run forecastability avising from Federal Re- 
serve operating procedures, and some long-run forecastability arising from 
business-cycle effects on interest rates, but at a one-year horizon the Federal 
Reserve may smooth interest rates so that the variability of % is small. Bal- 
duzzi, Bertola, and Foresi (1993), Rudebusch (1995), and Roberds, Runkle, 
and Whiteman (1996) argue for this interpretation of the evidence. Con- 
sistent with this explanation, Mankiw and Miron (1986) show that the pre- 
dictions of the expectations hypothesis fit the data better in periods when 
interest rate movements have been highly forecastable, such as the period 
immediately before the founding of the Federal Reserve System, 


10.3 Conclusion 
The results in Table 10.3 imply that naive investors, who judge bonds by their 


yields to maturity and buy long bonds when their yields are relatively high, 
have tended to earn superior returns in the postwar period in the United 


1 Fixed-EDncome Securities 


States, This finding is reminiscent ofthe finding discussed in Chapter 7, that 
stock returns tend to be higher when dividend yields are high at the start of 
the holding period. As in the stock market case, it is not clear whether this 
result reflects a failure of rationality on the part of investors or the presence 
of time-varying risk premia. Froot (1989) has used survey data to argue that 
bond market investors have irrational expectations, but there is also much 
theoretical work, discussed in the next chapter, that explores the impact of 
time-varying risk on the term structure of interest rates, 

This chapter has concentrated on the forecasting power of yield spreads 
for future movements in nominal interest rates. Yield spreads arc also useful 
in forecasting other variables. For example, one can decompose nominal 
vates into inflation rates and real interest rates; the evidence is that most of 
the long-run forecasting power of the term structure is for inflation rather 
than real interest rates (see Fama [1975, 1990] and Mishkin [19903, 1990b]). 
We mentioned in Chapter 8 that the slope of the term structure has some 
ability to forecast excess returns on stocks as well as bonds, Other recent 
studies bv Chen (1991b) and Estrella and Hardouvelis (1991) have shown 
thatthe term structure forecasts real economic activity, since inverted yield 
curves tend to precede recessions and steeply upward-stoping yield curves 
tend to precede expansions, 


Problems—Chapter 10 


10.1 You are told that an $year nominal zero-coupon bond has a log yield 
to maturity of 9.176, and a 9-year nominal zero-coupon bond has a log yield 
of 8.0%, 
10.1.1 Can the pure expectations theory of the term structure describe 
these data? 


10.1.2 A year goes by, and the bonds in part (а) still have the same 
yields to maturity, Can the pure expectations theory of the term structure 


describe these new data? 


10.1.3 Low would your answers change if you were told that the bonds 

have an 8% coupon rate per year, rather than zero coupons? 
10.2 Suppose that dhe monetary authority controls shortterm interest 
rates by setting 

Yu c Ур, a b AC у) e 

with A > 0, Intuitivels, the monetary authority tries to smooth interest 
rates but raises them when the vield curve is steep. Suppose also that the 
two-period bond yield satisfies 


Ya = Un / + x, 


Problems 495.5 


where x, is a term premium that represents the deviation of the two-period 


yield from the log pure expectations hypothesis, equation (10.2.10). The 
variable x, follows an AR(1) process 


хр = xr + т. 


The error terms €, and 7, are serially uncorrelated and uncorrelated with 
each other, 


10.2.1 Show that this model can be solved for an interest-rate process of 
the form 


yu = Waar + YX + е. 


Express the coefficient y as a function of the other parameters in, the 
model, | 


10.2.2 The expectations hypothesis of the term structure is often tested 


in the manner of equation (10.2.17) by regressing the scaled change i in 
the short rate onto the yield spread, 


Quai — 30/2 = a+ Bor ~ 9i) + usa, | 


and testing the hypothesis that the coefficient В = 1. If the model 


described above holds, what is the population value of the regression 
coefficient B? 


10.2.3 Now consider a version of the problem involving n-period bonds 
The monetary authority sets short-term interest rates as 
\ 
yu = Maar A0 уш) + En 


and the n-period bond yield is determined by 


Ум yu = (n— VE [уль — Yne) Tox 


where x, now measures the deviation of the n-period yield from the log 
pure expectations hypothesis (10.2.14). (This formulation ignores the 
distinction between у„ and -..) As before, x, follows an AR(1) process. 
What is the coefficient y in this case? What is the regression coefficient 
В in a regression of the form (10.2.16), 


uaa Ez Yni) = a+ В(ум з ун) /(n = 1) ＋ 31 ? 


10.2.4 Do you find the model you have studied in this problem to be a 


plausible explanation of empirical findings on the term structure? Why 
or why not? 


Note: This problem is based on McCallum (1994). 
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Term-Structure Models 


Tins CHAPTER EXPLORES the large modern literature on fully specified gen- 
cral-cquilibrium models of the term structure of interest rates. Much of 
this literature is set in Continuous time, which simplifies some of the theo- 
retical analysis but complicates empirical implementation. Since we focus 
on the econometric testing of the models and their empirical implications, 
we adopt a discrete-time approach; however we take care to relate all our 
results to their continuous-time equivalents. We follow the literature by first 
developing models for real bonds, but we discuss in some detail how these 
models can be used to price nominal bonds. 

All the models in this chapter start from the gencral asset pricing con- 
dition introduced as (8.1.3) in Chapter 8: 1 = ELC + Riis) Мау], where 
N. л is the real return on some asset and Mj, j is the stochastic discount Jac- 
tor. Ay we explained in Section 8.1 of Chapter 8, this condition implies that 
the expected return on any asset is negatively related to its covariance with 
the stochastic discount factor. In models with utility-maximizing investors, 
the stochastic discount factor measures the marginal utility of investors. As- 
sets whose returns covary positively with the stochastic discount factor tend 
to pay off when marginal utility is high—they deliver wealth at mes when 
wealth is most valuable to investors. Investors are willing to pay high prices 
and accept low returns on such assets. 

Fixed-income securities are particularly casy to price using this frame- 
work, When cash flows are random, the stochastic properties of the cash 
flows help to determine the covariance of an asset's return with the stochas- 
tic discount factor, But a fixed-income security has deterministic cash flows, 
so it covaries with the stochastic discount factor only because there is Ume- 
variation ín discount rates. "his variation in discount rates is driven by the 
time-series behavior of the stochastic discount factor, so term-structure mod- 
els are equivalent to time-series models for the stochastic discount factor. 

From (10.1.4) in Chapter 10, we know that returns on zz-period real 
zero-coupon bonds are related to real bond prices in a particularly simple 
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way: (Ed Hua) m Panta / Dus Substituting this into (8.1.3), we find that 
the real price of au ieperiad real bond, Par, satisfies 


15 = „11% 11 М]. (1 1.0.1) 


This equation lends itselfto a recursive approach. We model / asa function 
of those state variables that are relevant for forecasting the Af process. 
Given that process and the function relating P, , to state variables, we can 
calculate the function relating P, to state variables. We start the calculation 
by noting that Pa, = 1, 

Equation (11.0.1) can also be solved forward to express the period 
bond price as the expected product of n stochastic discount factors: 


Ры = М... Ma. (11.0.2) 


Although we emphasize the recursive approach, in some models it is more 
convenient to work directly with (11.0.2). 

Section HLI explores a class of simple models in which all relevant 
variables are conditionally lognormal and log bond yields are linear in state 
variables, These affine-yield models include all the most commonly used term- 
structure models, Section E1.2 shows how these models can be fit to nominal 
interest rate data, and reviews their strengths and weaknesses. One of the 
main uses of termestructure models is in pricing interest-rate derivative se- 
curities; we discuss this application in Section 11.3. We show how standard 
termestructure models can be modified so that they fit the current term 
structure exactly, We then use the models to price forwards, futures, and 
options on fixed-income securities. 


11.1 Affine-Yield Models 


То keep matters simple, we assume throughout this section that the distri- 
bution of the stochastic discount factor Ма is conditionally lognormal. We 
specify models in which bond prices are jointly lognormal with Mpi. We 
cun then take logs of (11.0.0) to obtain 


hu = IM IDE T Pn 1 7111 & 0/2) Vari mai + pues. (11.1.1) 


where as usual lowercase letters denote the logs of the corresponding up- 
percase letters so lor example my, = logt Ma1). This is the basic equation 
we shall use. 

We begin with two models in which a single state variable forecasts the 
stochastic discount factor, Secuon 11.1.1 discusses the first model, in which 
та is homoskedastic, while Section. 1.1.2 discusses the second model, 
in which the conditional variance of my, changes over time. These are 
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discrete-time versions of the well-known models of Vasicek (1977) and Cox, 
Ingersoll, and Ross (1985a), respectively. Section 11.1.3 then considers a 
more general model with two state variables, a discrete-time version of the 
model of Longstaff and Schwartz (1992). All of these models have the 
property that log bond prices, and hence log bond yields, are linear or 
affine in the state variables. This ensures the desired joint lognormality of 
bond prices with the stochastic discount factor. Section 11.1.4 describes the 


general properties of these affine-yield models, and discusses some alternative 
modelling approaches.“ 


11.1.1 A Homoskedastic Single-Factor Model 


It is convenient to work with the negative of the log stochastic discount 
factor, —mj41. Without loss of generality, this can be expressed as the sum 
of its one-period-ahead conditional expectation x, and an innovation EH: 


— My, = X + E41. (11.1.2) 


We assume that Ert is normally distributed with constant variance. 

Next we assume that xı follows the simplest interesting time-series 
process, a univariate AR(1) process with mean y and persistence ф. The 
shock to xi is written £44: 


хы = (I- + Oar + Fel- (11.1.3) 


The innovations to mı and xı may be correlated. To capture this, we 
write є as | 


єє = E l (1111.4) 
where EA and ты. are normally distributed with constant variances and are 
uncorrelated with each other. 

The presence of the uncorrelated shock nı only affects the average 
level ofthe term structure and not its average slope or its time-series behavior. 


To simplify notation, we accordingly drop it and assume that EH = 655771 
Equation (11.1.2) can then be rewritten as 


b 
= mai = х + Blt. (11.1.5) 


The innovation £144 is now the only shock in the system; accordingly ме can 
write its variance simply as c? without causing confusion. 

Equations (11.1.5) and (11.1.3) imply that - mi can be written as an 
ARMA(L,1) process since it is the sum of an AR(1) process and white noise. 


‘Our discrete-time presentation follows Singleton (1990), Sun (1992), and especially 
Backus (1993). Sun (1992) explores the relation between discrete-time and continuous-time 
models in more detail. 
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In fact, - has the same structure as asset returns did in the example of 
Chapter 7, Section 7.1.4. As in that example, it is important to rcalize that 
~My is nota Univariate process even though its conditional expectation x 
is univariate. Thus the univariate autocorrelations of ти do not tell us 
all we need to know for assct pricing; different sets of parameter values, with 
different implications for asset pricing, could be consistent with the same 
set of univariate autocorrelations for — 9,1. For example, these autocorre- 
lations could all be zero because o? = 0, which would make interest rates 
constant, but they could also be zero for o? Æ O if B takes on a particular 
value, and in this case interest rates would vary over time. 
We can determine the price of a one-period bond by noting that when 
n = 1, рада = poser = O, so the terms involving f, 4,444 in equation 
» 1.1.1) drop out. Substituting (11.1.5) and (11.1.3) into (11.1.0), we have 
| 
| 


pu = Ej naa] + (0/2) Var, nul] = —x + 5702/2. (11.1.6) 
The one-period bond yield yj, = = р, so 
i 
j yu = x, — Ва? 9. (11.1.7) 


| 


The short rate equals the state variable less a constant term, so it inherits 
the AR(1) dynamics of the state variable. Indeed, we can think of the short 
rate as measuring the state of the cconomy in this model. Note that there 
is nothing in equation (11.1.7) that rules out a negative short rate. 

We now guess that the form of the price function for an n-period bond 


i 


is 
= фи = AQ В, Xr. (11.1.8) 


Since the n-period bond yield y, = H/ u, we are guessing that the yield on 
a bond of any maturity is linear or affine in the state variable x, (Brown aud 
Schaefer [1991]). We already know that bond prices for n= O and n= 1 
satisfy equation (11.1.8), with Ao = By = 0, Ai = —8?0?/2, and B = 1. We 
proceed to verify our guess by showing that it is consistent with the pricing 
relation (II. I. I). At die same time we can derive recursive formulas for the 
coefficients A, and Ba. 

Our guess for the price function (11.1.8) implies that the two terms on 
the right-hand side of (11.1.1) arc 


1} 


E, [тд + bn -l. 461] K Ann = = фи — B. 10 Ms 


U 


Var, [+ + Paint 1 ] (В + ур (1 l. 1.9) 
Substituting (11.1.8) and (11.1.9) into (11.1.1), we get 
A, + В, X. X/ — Ay. Um Bad = ф)и — Bri? Xi 


+ (B+ B,_3)*a7/2 = 0, (11.1.10) 


11.1. Affine-Yield Models . 0 | 431 


"This must hold for any x, so the coefficients on x; must sum to zero and the 
remaining coefficients must also sum to zero. This implies 


В, = +B, = (0 %%%. 


As = Ар = -% OB + B, 15002. (11.1.41) 


We have now verified the guess (11.1.8), since with the coefficients in 
(11.1.11) the price function (111.8) satisfies the asset pricing equation 
(11.1.1) and its assumption that bond returns are conditionally lognormal. 


Implications of the Homoskedastic Model 

The homoskedastic bond pricing model has several interesting implications. 
Kirst, the coefficient B, measures the fall in the log price of an z-period 
bond when there is an increase in the state variable x, or equivalently in 
the one-period interest rate yj; Ii therefore measures the sensitivity of the 
n-period bond return to the one-period interest rate. Equation (11.1.11) 
shows that the coefficient B, follows a simple univariate linear difference 
equation in n», with solution (1 — $")/(1 — $). As n increases, B, approaches 
a limit B =/ — ф). Thus bond prices fall when short rates rise, and the 
sensitivity of bond returns to short rates increases with maturity. 

Note that B, is different from duration, defined in Section 10.1.2 of 
Chapter 10. Duration measures the sensitivity of the period bond return 
to the period bond yield, aud for zero-coupon bonds duration equals 
maturity. B, measures the sensitivity of the »-period bond return to the 
one-period interest rate; it is always less than maturity because the 2-period 
bond yield moves less than one-for-one with the one-period interest rate. 

A second implication of the model is that the expected log excess return 
on an period bond over a one-period bond, E, ы yu = Крав 
Pur + Pin is given by 


Ed raul — эн = Сомит. mar) Матт 1/2 


= B. I Сом fxn. maid — nm. i Var хк 1/2 


, 1602 , 2. (11.1.12) 


aml 

The first equality in (11.1.12) is a general result, discussed in Chapter 8, 
that holds for the excess log return on any assct ovcr the riskfrce interest 
Tale. It can be obtained by taking logs of the fundamental relation I = 
ELO + Riad M,] for the n-period hond and the short interest rate, and 
then taking the difference between the two equations. It says that the ex- 
pected excess log return is the sum of a risk premium term and a Jensen's 
Inequality term in the own variance which appears because we are working 
in logs. 


JI. devueStreuctiae Aadel 


The second equality ii (11.1.12) uses the fact that the unexpected com- 
ponent of the log retur on an period bond is just —,-4 times the in- 
novation in the state variable. The third equality in (1.1.12) uses the fact 
that the conditional va iance of xj, and its conditional covariance with gi, 
are constants to show that the expected log excess return on any bond is 
constant over time, so tit the log expectations hypothesis-but not ihe log 
pure expectations lwpothesis—holds, 

В, ) isthe coefficient from a regression of period log bond returns 
on stare variable minovations, so we can interpret — By. as the bond's load- 
ing on the single source of risk and Во? as the reward for bearing a unit of 
risk. Alternatively, following Vasicek (1977) and others, we might calculate 
the price of risk as the ratio of the expected excess log return on a bond, plus 
one half its own variance to adjust for Jensen's Inequality, to the standard 
deviation of the excess log return on the bond. Defined this way, ihe price 
of risk is just. -/fa in this model, 

The homoskedastic bond pricing model also has implications for the 
pattern of forward vates, and hence for the shape of the vield curve. To 
derive these implications, we note that in any termestructure model the ne 
period-ahead forward uae Ju satisfies 


m zm "m Pui it 
= = [ли | (К a | id Ia ppt hu ка (Fl Pant | = Pur) 
= у + (К,а м) — (Крат = pu. (LET) 


In this model E,Lp, i id pum - By rl Ах], and E, [r4 teat) = Yu, is given 
by (11.1.12). Substituting into (11.1.13) and using B, = (1 = $")/ ~ Ф), 


we get 


[<= p” ? a^ ; 
Jm = т В 4 ( - prod + p” (a, v н) 


p ocu I+ BUN > 
= 9 0 t —) а + у-н) + ) e уг: o" 


| | ) 5 zs 1.1.14 
PA E ои е 


The list equalitvin ( I. I. ED shows thatthe change in the -period forward 
rate is f" times the change ina, Thus movements in the forward rate die 
out geometrically at rate F. This can be understood by noting that the 
log expectations hypothesis holds in this model, so forward-rate movements 
rellect movements in the expected tuture short vate which are given Буф” 
times movements in the current short rate. 
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As maturity n increases, the forward rate approaches 


и = (Bx YO -= %)) 0/2, 


a constant that does not depend on the current value of the state variable x. 
Equation (11.1.7) implies that the average short rate is u — B*o?/9. Thus 
the difference between the limiting forward rate and the average short rate 
is | ! 

-(1/@ = 9))’0*/2 — (8/01 = ф))о?. | 
This is the same as the limiting expected log excess return on a long-term 
bond, Because of the Jensen's Inequality effect, the log forward-rate curve 
tends to slope downwards towards its limit unless f is sufficiently negative, 
B < 1/201 — ф). 

As x, varies, the forward-rate curve may take on different shapes. The 
second equality in (11.1.14) shows that the forward-rate curve can be written 
as the sum of a component that does not vary with n, a component that dies 
out with n at rate ¢, and a component that dies out with n at rate ф?. The 
third component has a constant coefficient with a negative sign; thus there 
is always a steeply rising component of the forward-rate curve. The second 
component has a coefficient that varies with x, so this component may 
be slowly rising, slowly falling, or flat. Hence the forward-rate curve may be 
rising throughout, falling throughout (inverted), or may be rising at first and 
then falling (hump-shaped) if the third component initially dominates and 
then is dominated by the second component further out along the curve. 
These are the most common shapes for nominal forward-rate curves. Thus, 
if one is willing to apply the model to nominal interest rates, disregarding 
the fact that it allows interest rates to go negative, one can fit most observed 
nominal term structures. However the inodel cannot generate a forward- 
rate curve which is falling at first and then rising (inverted hump-shaped), as 
occasionally seen in the data. 

It is worth noting that when @ = l, the one-period interest rate follows 
a random walk. In this case the coefficients A, and B, never converge as n 
increases. We have B, = nand A, — An- = —( + п – 1)*o?/2. The 
forward rate becomes fu = x — (В + n)°a? /2, which may increase with 
maturity at first if B is negative but eventually decreases with maturity forever. 
Thus the homoskedastic bond pricing model does not allow the limiting 
forward rate to be both finite and time-varying; either ¢ < 1, in which case 
the limiting forward rate is constant over time, or ¢ = 1, in which case 
there is no finite limiting forward rate. This restriction may seem rather 
counterintuitive; in fact it follows from the very general result—derived 
by Dybvig, Ingersoll, and Ross (1996)—that the limiting forward rate, if it 
exists, can never fall. In the homoskedastic model with ¢ < 1 the limiting 
forward rate never falls because it is constant; in the homoskedastic model 
with ф = | the limiting forward rate does not exist. 
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The discrete-time model developed in this section is closely related to 
the continuous-time model of Vasicek (1977). Vasicek specifies a continu- 
ous-time AR(1) or Ornstcin-Uhlenbeck process for the short interest rate r, 
given by the following stochastic differential equation: 


\ 

| dr = «(O — r)dt-FodB, (11.1.15) 
Where K, 0, and с are constants. Also, Vasicck assumes that the price of 
interest rate risk—the ratio of the expected excess return on a bond to the 
standard deviation of the excess return on the bond—is a constant that does 
hot depend on the level of the short interest rate. The model of this section 
dlcrives an AR(I) process for the short rate and a constant price of risk (rom 
үө assumptions on the stochastic discount factor. 


cquilibrium Interpretation of the Model 

Jur analysis has shown that the sign of the coefficient B determines the 
ign of all bond risk premia. To understand this, consider the effects of 
4 positive shock £i; which increases the state variable хф and lowers all 
Bond prices. When f is positive the shock also drives down u. so bond 

turns are positively corrclated with the stochastic discount factor. This 

orrelation has hedge value, so risk premia on bonds are negative. When f 
i$ negative, on the other hand, bond returns are negatively correlated with 
ilie stochastic discount factor, and risk premia arc positive. 
We can get morc intuition by considering the case where the stochastic 
discount factor reflects tlic power utility function of a representative agent, 
as in Chapter 8. In this case Mr = 6(C,,1/ C)”, where 4 is the discount 
factor and y is the risk-aversion coefficient of the representative agent. Tak- 
ing logs, we have 


May = log(d) y Aca. (11.1.16) 
It follows that x = Е [т] = -—log(5) + УЕ [Аст], and eni = 
MI - EI ma] = yAn — Er[Ac+1]). x is a linear function of 


expected consumption growth, and єр is proportional to the innovation 
in consumption growth, The termestructure model of this section then 
implies that expected consumption growth is an AR(1) process, so that 
realized consumption growth is an ARMA(1,1) process. The cocflicient 
B governs the covariance between consumption innovations and revisions 
in expected future consumption growth. If is positive, then a positive 
consumption shock today drives up expected future consumption growth 
and increases interest rates; the resulting fall in boud prices makes bonds 
covary negatively with consumption and gives them negative risk premia. If 


As in Chapter 9, dB in (11.1.15) denotes the increment to a Brownian motion; it should 
not be confused with the bond price coefficients В, of this section. 


11.1. Affine-Yield Models pabis 96 435 


В is negative, a positive shock to consumption lowers interest rates so bonds 
have positive risk premia. 

Campbell (1986) explores the rclation between bond risk premia and 
the timesseries properties of consumption in a related model. Campbell's 
model is similar to the one here in that consumption and asset returns are 
conditionally lognormal and homoskedastic. 1t is morc restrictive than the 
model here because it makes consumption growth (rather than expected 
consumption growth) a univariate stochastic process, but it is more gencral 
in that it does not require expected consumption growth to follow an AR(1) 
process. Campbell shows that the sign of the risk premium for an -period 
bond depends on whether a consumption innovation raises or lowers con- 
sumption growth expected over (i = 0) periods. Backus and Zin (1994) 
explore this model in greater detail. Backus, Gregory, and Zin (1989) also 
relate bond risk premia to the time-series properties of consumption growth 
and interest rates, 

Cox, Ingersoll, and Ross (1985a) show how to derive a continuous- 
ume termestructure model like the one in this section from an underlying 
production model. Sun (1992) and Backus (1993) restate their results in 
discrete ime. Assume that there is a representative agent with discount 
factor ô and time-separable log utility. Suppose that the agent faces a budget 
constraint of the form 


Kaa = (K, СОХ Van (11.1.17) 


where X, is capital at the start of the period, (A; = G) is invested capital, 
and X, Мз is the return on capital. This budget coustraint has constant 
returns to scale because the return on capital does not depend on the level 
of capital. X, is the anticipated component of the return and Му is an 
unanticipated technology shock. With log utility it is well-known that the 


agent chooses C/K, = (1 — 6). Substituting this into (11.1.17) and taking 
logs we find that 


Day = log(d) + K bi (41.1.18) 


where v44 = log( Vas), and =m) = = log) t Ac, = x Gray. This 
derivation allows x, to follow any process, including the AR(1) assumed by 
the term-structure model, 


11.1.2 A Square-Hoot Single-Factor Model 


The homoskedastic model of the previous section is appealing because of 
its simplicity, but it has several unattractive features. First, it assumes that 
interest rate changes have constant variance. Second, the model allows in- 
terest rates to go negative. This makes it applicable to real interest rates, but 
less appropriate for nominal interest rates. Third, it implies that risk premia 
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are constant over time, contrary to the evidence presented in Section 10.2.1 
of Chapter 10. One can alter the model to handle these problems, while 
retaining much of the simplicity of the basic structure, by allowing the state 
variable x, to follow a conditionally lognormal but heteroskedastic square-root 
process, This change is entirely consistent with the equilibrium foundations 
for the model piven in the previous section. 

The square-root model, which is a diserete-time version of the famous 
Cox, Ingersoll, and Ross (19853) continuous-time model, replaces (11.1.5) 
and (11.1.3) with 


= Hag = xp x" ey = * t xl BELA, (11.1.19) 


хаа = 0 — uet Oxy + t. (11.1.20) 


The new element here is that the shock E is multiplied by xi^. To 
understand the importance of this, recall that in the homoskedastic model 
Xi, and my, are normal couditioual on x, for all i > I. This means that 
one can analyze the homoskedastic model either by taking logs of (11.0.1) 
to get the recursive equation (11.1.1), or by taking logs of (11.0.2) to get an 
period loglinear equation: 


Pm = Clg |o aya |b / ) Var lms T mya}. (11.1.21) 


Calculations based on (11.1.21) are more cumbersome than the analvsis 
presented in the previous section, but they give the same result. In the 
square-root model, by contrast, хрр and mugg are normal conditional on 
x, but ху, and ayy, are nonnormal conditional on x, for all i > J. This 
means that one can only analyze the square-root madel using the recursive 
equation (I. I. I); the period loglinear relation (11.1.21) does not hold 
in the square-root model, 

Proceeding with the recursive analysis as before, we can determine the 
price of a one-period bond by substituting (11.1.19) into (11.1.1) to get 


Ри = mel O/D Vard mail = NC 3039). (11.1.22) 


The one-period bond yield үү = = ри is now proportional to the state 
variable xy, Once again the short rate measures the state of the economy in 
the model, 

Since the short rate is proportional to the state variable, it inherits che 
property that its conditional variance is proportional to its level. Many 
authors have noted that interest rate volatility tends 10 be higher when 
interest rates ave high; in Section 11.9.9 we discuss the empirical evidence 
on this point. This property also makes it hard for the interest rate to go 
negative, since the upward drift in the state variable tends to dominate the 
random shocks as x, declines towards zero. Cox, Ingersoll, and Ross (19852) 
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E 
show that negative interest rates are ruled outin the continuous-time version 
of this model, where the instantaneous interest rate follows the process 
dr = k(0—rydt-or* dB Time-variation in volatility also produces time- 
variation in term premia, so that the log expectations hypothesis no longer 
holds in this model. . 

We now guess that the price function for an »-period bond has the same 
linear form as before, —p,, = A, + B, x, equation (11.1.8). In this model 
Au = By = 0,4 = 0,and B, = 1 ~ f*c?/2. It is straightforward to 
verify the guess and to show that A, and B, obey 


B, zm] + ФВ, ==, (В + B. 1202/2. 


An — А = (1—-фиВу. (1.1.23 


Comparing (11.1.23) with (11.1.11), we see that the term in o? has been 
moved from the equation describing A, to the equation describing B,. This 
is because the variance is now proportional to the state variable, so it affects 
the slope coefficient rather than the intercept coefficient for the bond price. 
The limiting value of B,, which we write as B, is now the solution to a 
quadratic equation, but for realistic parameter values this solution is close 
to the limit 1/(1 ) from the previous model. Thus B, is positive and 
increasing in n. 

The expected excess log bond return in the square-root model is given 
by 


Edna] = уи = Cow. maa] — Маг, I 1/2 


Ш 


Bros Сом (х, mı] — cns Vary (x41) /2 


tl 


(B.- — BU 07/2) (11.1.24) 


The first two equalities here are the same as in the previous model. The 
third equality is the formula from the previous model, (11.1.12), multiplied 
hy the state variable x. Thus the expected log excess return is proportional 
to the state variable x, or, equivalently, to the short interest rate ун. This is 
the expected result since the conditional variance of interest rates is pro- 
portional to x, Once again the sign of 8 determines the sign of the risk 
premium term in (11.1.24). Since the standard deviation of excess bond 
returns is proportional to the square root of x,, the price of interest rate 
risk—the ratio of the expected excess log return on a bond, plus one half its 
own variance to adjust for Jensen's Inequality, to the standard deviation of 


“Depending on the parameter values, it may be possible for the interest rate io be zero in the 
cominuoustime model, LongstafT (1992) discusses alternative ways to model this possibility. 
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the excess log return on the bond—is also proportional to the square root 
lof x. 
8 The forward rate in the square-root model is given by 


Ји 


ll 


Ju + B,G[Ax.44] ~ Covi[ xii. та] — В? Маг{х+11/2 
(1-20 - В„[ (1 — ф)(х — и) + х Во] 
~ B! хах. (11.1.25) 


ti 


“he first equality in (11.1.25) is the same as in the homoskedastic model, 
vhile the second equality multiplies variance terms by x, where appropri- 
ue. It can be shown that the square-root model permits the same range of 
$hapes for the yield curve—upward-sloping, inverted, and humped—as the 
1Y0moskedastic model. 

Pearson and Sun (1994) have shown that the square-root model can be 
generalized to allow the variance of the state variable to be linear in the level 
of the state variable, rather than proportional to it, One simply replaces the 
x! terms, multiplying the shocks in (11.1.19) and (11.1.20) with terms of 
the form (оо +a; x,)!?. The resulting model is tractable because it remains 
in the affine-yicld class, and it nests both the homoskedastic model (the 
case ag = Il, a = 0) and the basic square-root model (the case ay = 0, 
a, = 1), 


11.1.3 A Two-Factor Model 


So far we have only considered single-factor models. Such models imply 
that all bond returns are perfectly correlated. While bond returns do tend 
to be highly correlated, their correlations are certainly not one and so it is 
natural to ask how this implication can be avoided. 

We now present a simple model in which there are two factors rather 
than one, so that bond returns are no longer perfectly correlated.“ The 
model is a discrete-time version of the model of Longstaff and Schwartz 
(1992). It replaces (11.1.19) with 


— тад = хи T XP egt. (11.1.26) 
and replaces (11.1.20) with a pair of equations for the state variables: 
1/2 ‹ 
хыз = (= dpa + dix Ell. (11.1.27) 


Xp = (1 = 2) + ф xu + xi а. (11.1.28) 


A Although bond returns are not pertectly correlated in this model, the covariance matrix 
of bond returns has rank two and hence is singular whenever we observe more than two bonds, 
We discuss this point further in Section 11.1.4. 

hl 
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Finally, the relation between the shocks is 


Et = Birt. (11.1.29) 


and the shocks EI ы and £241 are uncorrelated with cach other We will 
write at for the variance of Fi and a; for the variance 0152711. 

In this model, minus the log stochastic discount factor is forecast by two 
state variables, xj, and x. The variance of the innovation to the log stochas- 
tic discount factor is proportional to the level of xl, as in the square-root 
model; and each of the two state variables follows a square-root autore- 
gressive process. Finally, the log stochastic discount factor is conditionally 
correlated with xi but not with x. This last assumption is required to keep 
the model in the tractable affine-yield class. Note that the two-factor model 
nests the single-factor square-root model, which can be obtained by setting 
xy, = 0, but does not nest the single- factor homoskedastic model. 

Proceeding in the usual way, we find that the price ofa one-period bond 


is 
pu = Ed mua] (/) var, Hi! = = xu + xu 07/2. 01.1.30) 
The one-period bond yield yy = — fn, is no longer proportional to the 


state variable xy, because it depends also on ху. The short interest rate 
is no longer sufficient to measure the state of the economy in this model. 
Lougstatt and Schwartz (1992) point out, however, that the conditional 
variance of the short rate is a different linear function of the two state 
variables: 


Миу) = d- Bf /) Xy tal N 27 (11.1.31) 


Thus the short rate and its conditional volatility summarize the state of the 
economy, and one can always state the model in terms of these two variables. 

We guess that the price function for an -period bond is linear in the 
two state variables: — ры = An + Bi, Xu F Ban xar. We already know that 
Ag = Bo = Bo = 0, A1 = 0, By = 1 - 02/2, and By = J. Mis 
straightforward to show that An, Bin, and Bz, obey 


Bu = V4 Ob Binet = ( + H „f/. 


Bon = 1+ Po Bz 11 = BÈ 103/2 


Ay — Aut = (1 = $i Bua + (1 = Po) pee I. 1. (! 1.1.32) 
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The difference equation for Ву, is the same as in the single-factor square- 
root model, (11.1.23), but the difference equation for By, includes only a 
term in the own variance of x» because x is uncorrelated with m and does 
not affect the variance of m. The difference equation for A, is just the sum 
of two terms, cach of which has the familiar form from the single-factor 
square-root model. 

The expected excess log bond return in the two-factor model is given 
by 


Ш 


Ка Ma == Сому maris ot] = Vari nua 1/2 
` 2 ; 
= haa Covel xi. тү] T Bina Varil xia 1/2 
в Vande ы] 


= -Bn Hof — B 


1% /2Jx 
LS, as |. (11.1.33) 


This is the same as in the square-root model, with the addition of an extra 
term, arising from Jensen's Inequality, in the variance of x» ук. 

"The forward rate in the two-factor model is given by 

fa = Yu BALAN aE Covb uas Maa D E Bon А хэ 


- BE Vari vi = BS, VH 1/2 


M 


O = "at /2)yy л — B= Qu = р) 
Bay = ) N, = дз) = By хи Hor 


N, хиа = HS, suas /2. (11.1.34) 
This is the obvious generalization of the square-root model. Importantly, 
it can generate more complicated shapes for the yield curve, including 
inverted hump shapes, as the independent movements of both xy and xj, 
affect the term structure, 

The analysis of this model illustrates an important principle. As Cox, 
Ingersoll, and Ross (10853) and Dybvig (1989) have emphasized, under 
certain circumstances one can construct multifactor term-structure models 
simply by "adding up? single-factor models. Whenever the stochastic dis- 
count factor pq can be written as the sum of two independent processes, 
then the resulting term structure is the sum ofthe term structures that would 
exist under each of these processes, In the Longstalf and Schwartz (1999) 
model the stochastic discount factor is the sum of —xi — x BE and 
ху and these components are independent of each other. Inspection of 
(11.1.34) shows thatthe resulting term structure is just the sum ofa general 
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; E 
square-root term structure driven by the x, process and a special square-root 
term structure with parameter restriction 8 = 0 driven by the x», process. 


11.1.4 Beyond Affine-Yield Models 


We have considered a sequence of models, each of which turns out to have 
the property that log bond yields are linear or affine in the underlying state 
variables. Brown and Schaefer (1991) and Duffie and Kan (1993) have 
clarified the primitive assumptions necessary to get an affine-yield model. 
In the discrete-time framework used here, these conditions are most easily 
stated by defining a vector x, which contains the log stochastic discount 
factor m, and the time t values of the state variables relevant for forecasting 
future myi, = 1... n. If the conditional forecast of x one period ahead, 
Е, [хет], is affine in the state variables, and if the conditional distribution of 
x one period ahead is normal with a variance-covariance matrix Var,(x;41] 
which is affine in the state variables, then the resulting term-structure model 
is an affine-yield model. 

To see this, consider the steps we used to derive the implications of 
each successive term-structure model. We first calculated the’ log short- 
term interest rate; this is affine in the underlying state variables if mi is 
conditionally normal and E,[m41] and Var;(m 1] are affine in the state 
variables. We next guessed that log bond yields were affine and proceeded 
to verify the guess. If yields are affine, and if x is conditionally normal with 
affine variance-covariance matrix, then the risk premium on any bond is 
affine. Finally we derived log forward rates; these are affine if the short rate, 
risk premiuin, and the expected change in the state variable are all affine. 
Affine forward rates imply affine yields, verifying that the model is in the 
affine-yield class. 

Brown and Schaefer (1991) and Duffie and Kan (1993) state conditions 
on the short rate which deliver an affine-yield model in a continuous-time 
seuing. They show that the risk-adjusted drift in the short rate—the ex- 
pected change in the short rate less the covariance of the short rate with 
the stochastic discount factor—and the variance of the short rate must both 
be affine to get an affine-yield model. The models of Vasicek (1977), Cox, 
Ingersoll, and Ross (1985a), and Pearson and Sun (1994) satisfy these re- 
quirements, butsome other continuous-time modelssuch as that of Brennan 
and Schwartz (1979) do not. 

Affine-yield models have a number of desirable properties which help to 
explain their appeal. First, log bond yields inherit the conditional normality 
assumed for the underlying state variables. Second, because log bond yields 
are linear functions of the state variables we can renormalize the model 
so that the yields themselves are the state variables. This is obvious in a 
one-factor model where the short rate is the state variable, but it is equally 
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possible in a model with any number of factors. Longstalf and Schwartz 

(1992) present their two-factor model as one in which the volatility of the 
short rate and the level of the short rate are the factors; the model could 
be written equally well in terms of any two bond yields of fixed maturities. 
Third, affinc-yield models with K state variables imply that the term structure 
* of interest rates can be summarized by the levels of K bond yields at each 
‘point in time and the constant coefficients relating other bond yields to the 
K basis yields. [n this sense affinc-yicld models are linear; their nonlinearity 
is confined to the process governing the intertemporal evolution of the K 
basis yields and the relation between the cross-sectional coefficients and the 
underlying parameters of the model. 

Affinc-yield models also have some disadvantages. The linear relations 
among bond yields mean that the covariance matrix of bond returns has 
rank K—equivalently, we can perfectly fit the return on any bond using a 
regression on K other contemporancous bond returns. This implication 
will always be rejected by a data set containing more than K bonds, unless 
we add extra error terms to the model. Affine-yicld models also limit the way 
in which interest rate volatility can change with the level of interest rates; 
for example a model in which volatility is proportional to the square of the 
interest rate is not affine. Finally, as Constantinides (1992) emphasizes, 
single-factor affine-yicld models imply that risk premia on long-term bonds 
always have the same sign. 

If we move outside the affinc-yicld class of models, we can no longer 
work with equation (11.1.1) but must rcturn to the underlying nonlincar 
difference equation (11.0.1) or its period representation (11.0.2). In gen- 
eral these equations must be solved numerically. One common method is to 
set up a binomial tree far the short-term interest rate. Black, Derman, and 
Toy (1990) and Black and Karasinski (1991), for example, assume that the 
simple onc-period yield Yi, is conditionally lognormal (as opposed to the 
assumption of affine-yield models that (1 + Yy) is conditionally lognormal). 
They use a binomial tree to solve their models for the implied term struc- 
ture of interest rates. Constantinides (1992), however, presents a model . 
that can be solved in closed form. His model inakes the log stochastic dis- 
count factor a sum of noncentral chi-squared random variables rather than 
а normal random variable, and Constantinides is then able to calculate the 
expectations in (11.0.2) analytically. 


" 


| 
| 11.2 Fitting Term-Structure Models to the Data 


; 11.2.1 Real Bonds, Nominal Bonds, and Inflation 


i 
The term-structure models described so far apply to bonds whose рауой are 
fiskless in real terms. Almost all actual bonds instead have payoffs that are 
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А К | A : 
riskless in nominal terms.” We now discuss how the models can be adapted 
to deal with this fact. 


To study nominal bonds we need to introduce some new notation. We 
write the nominal price index at time Cas Q, and the gross rate of inflation 
from tto t+ las Пар = /. We have already defined P,, to be 
the real price of an period real bond which pays one goods unit at time 
1+ п; we now define P$, to be the nominal price of an. period nominal 
bond which pays $1 at time ¢-F n. From these definitions it follows that the 
nominal price of an period real bond is Pu (д, and the real price of an 
n-period nominal bond is р / Qj. We do not adopt any special notation Гог 
these last two concepts. 


If we now apply the general asset pricing condition, 
1 = EAU + Rip Maib 


to the real return on an n period nominal bond, we find that 


РУ M ut 
2 = E, Do Me . (11.2.1) 
+1 


Multiplying through by Qj, we have 


> | ps Q 
n = E puo Mya 5. 
Um 
= л Е ae] 
s t n 
* fe 
> $ $ Ou 
= КР M, (11.2.2) 
where МУ, = Мы / Tli can be thought ofasa nominal stochastic discount 
1+1 + b 


factor that prices nominal returns. 

The empirical literature on nominal bonds uses this result in onc of two 
ways. The first approach is to take the primitive assumptions that we made 
about Мат in Section 11.1 and to apply them instead to А: The real 
termestructure models of the last section are then reinterpreted as nominal 
term structure models, Brown and Dybvig (1986), for example, do this when 


“Some governments, notably those ol Canada, Irach and tie UK, have issued bonds whose 
nominal payoffs are linked to a nominal price index. In [996 the US Ticasury is considering 
issuing similar securities, These index-linked bonds approximate real bonds but are rarely 
exactly equivalent to reat bonds. Brown and Schacter (19904) give a lucid discussion of the 
imperfections in the UK indexing system, and apply the Cox, Iugeisoll, and Ross (19854) 
model iv. UK indexdinked bonds. Sce also Barr and Сашрһей (1995) and Canpbelt and 
Shiller (1996). 
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they apply the Cox, Ingersoll, and Ross (1985a) square-root model directly 
to data on US nominal bond prices, ‘The square-root model restricts interest 
rates to be positive, and in this respect it is morc appropriate for nominal 
interest rates than for real interest rates, 

The second. approach is to assume that the two components of the 
nominal stochastic discount factor, Aly, and Пр, are independent of 
each other, To sce how this assumption facilitates empirical work, take logs 
of the nominal stochastic discount factor to get 


15 = Hacc ul. (11.2.3) 


When tlie components ayy anda, are independent, we can price nominal 
bonds by using the insights of Cox, Ingersoll, and Ross (1985a) and Dybvig 
(1989). Recall from Section 11.1.3 their result that the log bond price in a 
model with two independent components of the stochastic discount factor 
is the sum of the log bond prices implied by each component, We can, for 
example, apply the Longstaffand Schwartz (1992) model to nominal bonds 
by assuming that inj; is described by a square-root single-factor model, 
=Ma = xg t SHE pj, and that yy is known at Cand equal to a state 
уйне x». We then get D = “Mp EAn = Xut xi" Bëri + Xp 
and the LongstallSchwartz model describes nominal bonds. 

More generally, the assumption that Mr, and Пате independeut 
implies that prices of nominal bonds are just prices of real bonds multiplied 
by the expectation of the future real value of money, and that expected real 
returns on nominal bonds are the same as expected real returns on real 
bonds. То see this, consider equation (11.2.2) with maturity п = 1, and 
note that the independence of Afj and! / H allows us to replace the 
expectation of (heir product by the product of their expectations: 


1 


+1 


| „fi , 
РУ = EAM = E. HA, E. [к] - n.n 


1 


|: (11.2.4) 


since Pj, = Ef Mya} and П = /r. Thus the nominal price of 
a bond which pays $1 tomorrow is the nominal price of a bond which pays 


one unit of goods tomorrow, times the expectation of the real valuc of $1 
tomorrow, 


We now guess thai a similar relationship holds for all maturities n, and 


we prove this by induction, the (н 1)-period relationship holds, p ‚= 
P, aa QE Qaa ih then 


l Q 
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where the last equality uses both the independence of real variables from 
the price level (which enables us to replace the expectation of a produ 
by the product of expectations), and the fact that P4, = E,[P4 1,54 кү 
Equation (11.2.5) is the desired result that the nominal price of a bond 
which pays $1 at time t + n is the nominal price of a bond which pays one 
unit of goods at time / + n, times the expected real value of $1 at time (+ n. 
Dividing (11.2.5) by Qj, we can see that the same relationship holds between 
the real prices of nominal bonds and the real prices of real bonds. Further, 
(11.2.5) implies that the expected real return on a nominal bond equals the 
expected real return on a real bond: 


| & | _ . H ee Ом Q | 
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А 11 1 
E,| — — |. 11.2.6 
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Gibbons and Ramaswamy (1993) use these results to test the implications 
of real term-structure models for econometric forecasts of real returns on 
nominal bands. 

Although it is extremely convenient to assume that inflation is indepen- 
dent of the real stochastic discount factor, this assumption may be unreal- 
istic. Barr and Campbell (1995), Cainpbell and Ammer (1993), and Pen- 
nacchi (1991), using respectively UK data on indexed and nominal bonds, 
rational-expectations methodology applied to US data, and survey data, all 
find that innovations to expected inflation are negatively correlated in the 
short run with innovations to expected future real interest rates. More 
directly, Campbell and Shiller (1996) find that inflation innovations are 
correlated with stock returns and real consumption growth, proxies for the 
stochastic discount factor suggested by the traditional CAPM of Chapter 5 
and the consumption CAPM of Chapter 8. 


a 1.25) 


11.2.2 Empirical Evidence on Affine-Yield Models 


All the models we have discussed so far need additional error terms if they 
are to fit the data. To see why, consider a model in which the real stochastic 
discount factor is driven by a single state variable. In such a model, returns 
on all real bonds are perfectly correlated because the model has only a single. 
shock, Similarly, returns on all nominal bonds are perfectly correlated in any 


446 11. Term-Structure Models 


model where a single state variable drives the nominal stochastic discount 
factor, In reality there are no deterministic linear relationships among 
returns on different bonds, so these implications are bound to be rejected 
by the data, Adding extra state variables increases the rank of the variance- 
covariance matrix of bond returns from one to K, where K is the number 
of state variables, but whenever there are more than K bonds the matrix 
remains singular—cquivalently, there are deterministic linear relationships 
among bond returns. So these models, too, are trivially rejected by the data. 

To handle this problem empirical researchers allow additional error 
terms to affect bond prices. These errors may be thought of as measurement 
errors in bond prices, errors in calculating implied zero-coupon prices from 
an observed coupon-bearing terin structure, or model specification errors 
arising from tax effects or transactions costs. Alternatively, if one uses a 
model for the real stochastic discount factor and tests it on nominal bonds 
in the manner of Gibbons and Ramaswamy (1993), the errors may arise 
from unexpected inflation. 

Whatever the source of the additional errors, auxiliary assumptions 
about their behavior are needed to keep the model testable. One common 
assumption is that bond- price errors are serially uncorrelated, although they 
may be correlated across bonds. This assumption makes it casy to examine 
the time-series implications of term-structure models. Other authors as- 


sume that bond-price errors are uncorrelated across bonds, although they 
may be correlated over time. | 


Affine-Yield Models as Latent-Variable Models 

Stambaugh (1988) and Heston (1992) show that under fairly weak assump- 
tior about the additional bond price errors, an affinc-yield model implies 
a latent-variable structure for bond returns. Variables that forecast bond 
returns can do so only as proxies for the underlying state variables of the 
model; if there are fewer state variables than forecasting variables, this puts 
testable restrictions on forecasting equations for bond returns. 

'A general affine-yield model with K state variables takes the form 


— фи = An + Bin Xu + °°° + Ben XKt (11.2.7) 


where хи, = 1... К, are the state variables, and A, and Bin, k = 1... К, 
are constants. The model also implies that expected excess returns on long 
bonds over the short interest rate can be written as 


Ктин = ун] = A, + BY, xu BR, Xk. (11.2.8) 


where Af and Bi, k = I. . . X, are constantis. The model puts cross- 
sectional restrictions on these constants which are related to the time-serics 
prodess driving the state variables, but we ignore this aspect of the model 
herd. 


| 
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Now suppose that we do not observe the true excess returns on long 
bonds, but instead observe a noisy measure 


ltl = Maret Ун E ante (11.2.9) 


where ыры isan error term. We assume that à, ;, | is orthogonal to a vector 
h, containing J instruments hy jo = 1...J: 


Pn | 5] = 0. (11.2.10) 


The vector hy might contain lagged variables, for example, if the return 
error 1, 444 is serially uncorrelated. We further assume that for each state 
variable xy, k = 1... X, the expectation of the state variable conditional 
on the instruments is linear in the instruments: 


1 
Eus | hu] = $O Oy hj (11.2.11) 
"m 


for some constant coefficients Өк. 

These assumptions imply that the expectation of е, гт conditional on 
the instruments, which from (11.2.10) is the same as the expectation of the 
true excess return лор yu conditional on tlie instruments, is lincar in the 
instruments: i 


К [ее | hi] = E ner — yu | b} = 


K A / 
Aj L К h = A, + HN, бубу. (12.12) 


=| k=} j=l 


IF we define er, to be the vector I.. exui] for assets n = 1... N, 
then (11.2.12) can be rewritten in vector form as 


eni = A" Ch, Атр (11.2.13) 


where A“ isa vector whose nth elementis A7 aud C is a matrix of coefficients 
whose (н, J) clement is 
K 
Cu = Ui lip (11.2.14) 
k=l 
Equations (11.2.13) and (11.2.14) define a kitent-variable model for 
expected excess bond returns with А latent variables. Equation (11.2.14) 
says that the (N x J) matrix of coefficients of N assets on / instruments has 
rank at most А, where A is the number of state variables in the underlying 
term-structure model. The instruments forecast excess bond returns only 
through their ability to proxy for the state variables (measured by the б, 
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coefficients) and the role of the state variables in determining excess bond 
returns (measured by the Bf cocfficients). The system is particularly easy 
to understand in the single-factor case. Here K = 1, we can drop the К 
subscripts, and (112.14) becomes 


пр 


Equation (11.2.15) says that cach row of the matrix С is proportional to cach 
other row, and the coefficientsof proportionality are ratios of B* coefficients. 
Note that the rank of the matrix C could be less than К; for example, it is 
zero in a homoskedastic model with A state variables because in such а 
model the coefficients B7, are zero for all A and n. 

Latent-variable models of the form (11.2.14) or (11.2.15) have been 
applied to financial data by Campbell (1987), Gibbons and Ferson (1985), 
and Hansen and Hocdrick (1983). They can be estimated by Generalized 
Method of Moments applied to the system of regression equations (11.2.13). 
Heston (1992) points out that one can equivalently estimate a system of in- 
strumental variables regressions of excess returns on K basis excess returns, 
where the elements of hy are the instruments. 

A key issue is how to choose instruments h, that satisfy (11.2.10). (o 
thogonality of instruments and bond pricing errors) and (11.29.11). (state 
variables linear in the instruments). In an affinc-yield model without bond 
pricing errors, bond yields and forward rates are linear in the state variables; 
hence the state variables are linear in yields and forward vates. This property 
survives the addition of normally distributed bond pricing errors. ‘Thus itis 
natural to choose yields or forward rates as instruments satisfying (11.9.11). 

To sausfy (11.2.10), one must be more specific about the nature of the 
bond pricing errors. The error in a bond price measured at time (affects 
both the time ¢ bond yield and the excess return on the bond from ¢ to 
UE T. Hence yields and forward vates measured at time (are not likely to 
be orthogonal to errors in excess bond returns from (to { J. I the bond 
price errors ave uncorrelated across time, however, then yields and forward 
rates measured at time (— | will he orthogonal to excess bond return errors 
from (00 / T and if che bond price errors are uncorrelated across bonds, 
then one can choose а set of yields or forward rates measured at different 
maturities than those used for excess returns. Stambaugh (1988) applies 
both these strategies to monthly data on US Treasury bills of maturities two 
lo six months over the period 1959: to 1985:1. He finds strong evidence 
against a model with one state variable and weaker evidence against a model 
with two state variables, Heston (1992) studies a more recent period, 1970:2 
to 1988.5. and a diia set including longer maturities (6, 12, 36, and 60 
months) and finds litle evidence apainst a model with one state variable. 
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Evidence on the Short-Rate Process 

If one is willing to assume that there is negligible measurement error in 
the short-term nominal interest rate, then time-series analysis of short-rate 
behavior may be a useful first step in building a nominal term-structure 
model. Chan, Karolyi, Longstaff, and Sanders (1992) estimate a discrete- 
time madel for the short rate of the form 


ыя = Ju = a + By + є. (11.2.16) 
where 7 
Efe] = 0, Е] = . (11.2.17) 
This specification nests the single-factor models we discussed in Section 11.1; 
the homoskedastic model has y = 0, while the square-root model has 
y = 0.5. It also approximates a continuous-time diffusion process for the 
instantaneous short rate r(t) of the form dr = (Во + Bı r)dt +o r” dB. Such 
a diffusion process nests the major single-factor continuous-time models for 
the short rate. The Vasicek (1977) model, for example, has y = 0; the 
Cox, Ingersoll, and Ross (1985a) model has у = 0.5, and the Brennan and 
Schwartz (1979) model has y = 1.5 

Chan et al. (1992) estimate (11.2.16) and (11.2.17) by Generalized 
Method of Moments. They define an error vector with two elements, the first 
being yiu: О + 8)yi; ~a and the second being (01441 — (1 + 8(9 – a)" — 
G 2010. These errors are orthogonal to any instruments known at time 6 a 
constant and the level of the short rate yj, are used as the instruments. In 
montlily data on a one-month Treasury bill rate over the period 1964:6 to 
1989:12, Chan et al. find that æ and £ are small and often statistically in- 
significant. The short-term interest rate is highly persistent so it is hard to 
reject the hypothesis that it follows a random walk. They also find that y 
is large and precisely estimated. They can reject all models which make 
y < l and their unrestricted estimate of y is about 1.5 and almost two 
standard errors above 1. 

To understand these results, consider the case wherea = B = 0,so 
the short rate is a random walk without drift. Then the error term e. ir 
(11.2.16) is just the change in the short rate yi 4 , and (11.2.17) says hat 
the expectation of the squared change in the short rate, E, IOI. A1 — 510021 = 


ay. Equivalently, 
2 
E, (mm) = a’, (11.2.18) 
: Vn 


so when the change in the short rate is scaled by the appropriate power of 
the short rate, it becomes homoskedastic. Figures II. Ia through d illustrate 


*Note however that (11.2.16) and (11.2.17) do not nest the Pearson-Sun model. 
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the results of Chan et al. by plotting changes in short rates scaled by various 
powers of short rates. The figures show (514 — y/o) for y = 0, 0.5, 
L and. 1.5, using the data of McCulloch and Kwon (1993) over the period 
1052:1 to 1991:2. Over the period since 1964 studied by Chan et al, it is 
striking how the variance of the series appears t0 stabilize as one increases 
y from 0 to 1.5. 

These results raise two problems for single-factor affinc-yield models of 
the nominal term structure. First, when there is no mean-reversion iu the 
short rate, forward rates and bond yields may rise with maturity initially, but 
they eventually decline with maturity and continue to do so forever, Second, 
single-factor affine-yield models require that y = 0 or 0.5 in (11.2.17). The 
estimated. value of 1.5 takes one outside the tractable class of afline-yield 
models and forces one to solve the termestructure model numerically. 

There is as yet no consensus about how to resolve these problems, Ait 
Sahalia (1996b) argues that existing parametric models are too restrictive; 
he proposes a nonparametric method for estimating the drift and volatility of 
the short interest rate, He argues that the short rate is very close to a random 
walk when itis in the middle of its historical range (roughly, between 4% and 
17%), but that it mean-reverts strongly wheu it gets outside this range. Chan 
et al, miss this because their linear model does not allow mcan-reversion to 
depend on the level of the interest rate. Aït-Sahalia also argues that interest- 
rate volatility is related to the level of the interest rate in a more complicated 
way than is allowed by any standard model. His most gencral parametric 
model, and the only one he does not reject statistically, has the short interest 
rate following the diffusion dr = (f + fir Ber? + a/ r)dt 4 (ag tort 
Gar" )d D. He estimates y to be about 2, but the other parameters of the 
volatility function also play an important role in determining volatility. 

Following Hamilton (1989), an alternative view is that the short rate 
randomly switches among different regimes, cach of which has its own mean 
and volatility parameters. Such a model may have mean-reversion within 
cach regime, but the short rate may appear to be highly persistent when one 
averages data from different regimes. If regimes with high mean parameters 
аге also regimes with high volatility parameters, then such a model may also 
explain the apparent sensitivity of interest rate volatility to the level of the 
interest rate without invoking a high value of y. Figures HH. la-d show that 
no single value of y makes scaled interest rate changes homoskedastic over 
the whole period since 1952; the choice of y = 1.5 works very well for 
1964 to 1991 but worsens heteroskedasticity in the 1950s and carly 19608.“ 
Thus at least some regime changes are needed to fit the data, and it may 
be that à model wih y = богу = 0.5 is adequate once regime changes 


‘Although this is not shown in the figures, the у = 1.5 model also breaks down in he 
1990s, 
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are allowed. Gray (1996) explores this possibility but estimates only slightly 
lower values of y than Chan et ab, while Naik and Lee (1994) solve for bond 
and bond-option prices in a regime-shift model with у = 0. 

Brenner, Harjes, and Kroner (1996) move in a somewhat different di- 
rection, They allow for GARCH effects on interest rate volatility, as de- 
scribed in Section 12.2 of € thapter 12, as well as the level effect on volatility 
described by (11.2.17). ‘They replace (0112.37) by Efe ү] = oy” and 
a? = w+ (le? + $a; ,. a standard GARCH (1,1) model. They find that a 
model withy = 0.5 (its the short rate series quite well once GARCH effects 
are included in the model; however they do not explore the implications of 
this for bond or bond-option pricing. 


Cross-Sectional Restrictions on the Term Structure 

So far we have emphasized the time-series implications of affinc-yield mod- 
els and have ignored their cross-sectional implications. Brown and Dybvig 
(1986) and Brown and Schaefer (1994) take the opposite approach, ignor- 
ing the models’ time-series implications and estimating all the parameters 
from the term structure of interest vates observed at à point in time. If this 
procedure is repeated over many time periods, it generates a sequence of 
parameter estimates which should in theory be identical for all timc periods 
but which in practice varies over time. The procedure is analogous to the 
common practice of calculating implied volatility by inverting the Black- 
Scholes formula using traded option prices; there too the model requires 
that volatility be constant over time, but implied volatility tends to move over 
ume. 

Of course, bond pricing errors might cause estimated parameters to 
shiftovertime even iftrac underlying parameters are constant. Butin simple 
termestructure models there also appear to be some systematic differences 
between the piiuueteec values needed to fit cross-sectional termestructure 
data and the parameter values implied by the time-series behavior of interest 
rates, These systematic differences are indicative of misspecification in the 
madels, To understand the problem, we will choose parameters in the 
single-factor homoskedastic and square-root models to fit various simple 
moments of the data and will show that the resulting model does not match 
some other characteristics of the data. 

In the bomoskedastic single-factor model, the important parameters of 


the model can be identified by considering the following four moments of 
the data: 
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The first-order autocorrelation of the short rate identifies the autoregressive 
parameter @ Given ф, the variance of the short rate then identifies the 
innovation variance aè. Given & and a’, the average excess return on a 
very long-term bond, or equivalently the average difference between a very 
long-terin forward rate and the short rate, identify the parameter В. Finally, 
given ф, а“, and fl, the mean short rate identifies д. 

In the zero-coupon yield data of McCulloch and Kwon (1993) over the 
period 1952 to 1991, the monthly first-order autocorrelation of the short 
rate is 0.98, implying $¢ = 0.98. The standard deviation of the short rate is 
3.064% at an annual rate or 0.00255 in natural units, implying o = 0.00051 
in natural units or 0.610% atan annual rate. 

In the data there is some discrepancy between the average excess return 
on long bonds, which from Table 10.2 is negative at —0.048% at an annual 
rate for n = 120, and the average slope of the forward-rate curve, which 
is positive at 1.507% at an annual rate when measured by the difference 
between a 60-120 month forward rate and the l-month short rate. The 
difference occurs because interest rates rose over the period 1952 to 1991; 
stationary termestructure models force the true mean change in interest 
rates to be zero, but an increase ip interest rates ina particular sample can 
make the sample mean excess return on long bonds negative even when 
the sample mean slope of the forward-rate curve is positive. The value. of 
B required to fit the average slope of the forward-rate curve is —122. The 
implied value for  — o?/2, expressed at an annual rate, is 7.632%. | 

The difficulty with the homoskedastic single-factor model is that with 
these parameters the average forward-ratc term-structure curves very grad- 
ually from its initial value to its asymptote, as shown by the dashed line in 
Figure 11.2. The sample average forward-rate curve over the 1952 to 1991 
period, shown by the solid line in Figure 11.2, rises much more steeply at 
fivst and then flattens out at about five years maturity. 

This problem arises because the theoretical average forward-rate curve 
approaches its asymptote approximately at geometric rate ¢. One could 
match the sample average forward-rate curve more closely by choosing, a 
smaller value of ¢. Unfortunately this would be inconsistent not only wigh 
the observed persistence of the short rate, but also with the observed pattern 
of volatility in forward rates. Equation (11.1.14) shows that the standard 
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Figure 11.2. Sample and Theoretical Average Forward-Rale Curves 


deyiation of the forward rate declines at rate ф. In the 1952 to 1991 period 
thd standard deviation of the n-period forward rate barely declines at all 
with maturity n, and this feature of the data can only be fit by an extremely 
persistent short-rate process. Backus and Zin (1994) discuss this problem 
in detail and suggest that a higher-order model which allows both transitory 
and persistent movements in the short rate can fit the term structure more 
successfully. 

Paramcter identification is somewhat morc difficult in the square-root 
model, Here the moments given in equation (11.2.19) become 
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(11.2.20) 
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where Bis the limiting value of B, from equation (11.1.23). As before, we 
can identify ¢ = 0.98 from the estimated first-order autocorrelation of the 
short rate, but now the other parameters of the model are sinultaucously 
determined. One can of course estimate them by Generalized Method of 
Moments. The square-root model, like the homoskedastic model, produces 
an average forward-rate curve that approaches its asymptote very slowly when 
the short rate is highly persistent; thus the model has many of the same 
empirical limitations as the homoskedastic model. 

In summary, the single-factor affine-yield models we have described in 
this chapter are too restrictive to fit the behavior of nominal interest rates. 
The latent-variable structure of the data, the nature of the short rate pro- 
cess, and the shape of the average term structure are all hard to fit with 
these models. In response to this researchers are exploring more general 
models, including affinc-yield models in which the single state variable fol- 
lows a higher-order ARMA process (Backus and Zin (1994]), affine-yield 
models with several state variables (Longstaff and Schwartz [1992]), regime- 
switching models (Gray [1996], Naik and Lee 11994), and GARCH models 
of interest rate volatility (Brenner, Harjes, and Kroner [1996}). No one 
model has yet emerged as the consensus choice for modeling the nominal 
term structure. We note however that Brown and Schaefer (1994) and Gib- 
bons and Ramaswamy (1993) have achieved some success in fitting simple 
models to prices of UK indexdinked bonds and econometrie forecasts of 
real returns on US nominal bonds. "Thus single-factor affinc-yield models 
may be more appropriate for modeling real interest rates than for modeling 
nominal interest rates. 


11.3 Pricing Fixed-Income Derivative Securities 


Onc of the main reasons for the explosion of interest in term-structure 
models is the practical need to price and hedge fixed-income derivative sc- 
curities. In this section we show how ternrstructure models can be used in 
this context. Section 11.3.1 begins by discussing ways to augment standard 
terme-structure modets so that they fit the current yield curve exactly. Deriva- 
tives traders usually want to take this yicld curve as given, and so they want 
to usc a pricing model that is fully consistent with all current bond prices. 
We explain the popular approaches of Ho and Lee (1986), Black, Derman, 
and Toy (1990), and Heath, Jarrow, and Morton (1992). Section 11.3.2 
shows how term-structure models can be used to price forward and futures 
contracts on fixed-income securities, while Section. 11.3.3 explores option 
pricing in the context of a termestructure model. 
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11.3.1 Fitting the Current Term Structure Exactly 


In general a model gives an exact fit to as many data points as it has parame- 
ters, The homoskedastie singlefactor model presented in Section LLL, for 
example, has four parameters, G. B, G, and и. Inevitably this model does 
not fit the whole term structure exactly, To allow for this the empirical work 
of the previous section added error terms, reflecting model specification 
error and measurement error in bond prices. 

In pricing fixed-income derivative securities it may be desirable 10 have 
a model that does Gt the current term structure exactly, To achieve this, we 
can use the result of Cox, Ingersoll, and Ross (19853) and Dybvig (1080) 
that one can add independent term-structure models together, A simple 
approach, due originally to Ho and Lec (1986), is to break observed forward 
rates fj, into two components: 


fn = [ef (11.3.1) 


where Jf, is the forward rate implied by a standard tractable modeland / is 
the residual, Fhe residual component is then attributed to a deterministic 
termesteucture model, Since a deterministic process is independent of any 
stochastic process, the decomposition (11.3.0). is always legitimate. There 
is it corresponding decomposition of the stochastic discount factor, 


Bap = mp t тї, ү: (11.3.2) 
In a deterministic model, the absence of arbitrage requires thet 


b h b а ‹ 
Jos = Main = ne (10.8.3) 


Thus we are postulating that future stochastic discount factors contain a 
deterministic component that is reflected in future short-term interest rates 
and current forward rates, 

Although this procedure works well in any one period, there is nothing 
to ensure that it will be consistent from period to period. A typical applica- 
tion of the approach sets Yu = 0, so thatthe current short rate is used as 
an input into the stochastic termestructure model without any adjustment 
fora deterministic component Deterministic components of future short 
rates 3X „ are then set to nonzero values to fit the time ( term structure, 
When time (CH n arrives, however, this procedure is repeated; now TEN is 
set to zero and deterministic components of more distant future short rates 
are made nonzero to fit the time / H term structure, As Dybvig (1989) 
emphasizes, this time inconsistency is troublesome although the procedure 
may work well for some purposes. 

Jt is also important to understand that fitting one set of asset prices ex- 
actly does not guarantee that a model will fit other asset prices accurately, 
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Backus, Foresi, and Zin (1996) illustrate this problem as follows. They as 
sume that the homoskedastic single-factor model of subsection 1 1.1.1 holds; 
with a mean-reverting short rate so ¢ < 1. They show that one can exactly’, 
fit the current term structure with a homoskedastic random walk model, a 


lognormal version of Ho and Lee (1986). The model uses equation (11.1.5), ` 
but replaces equation (11.1.3) with | 


Xii = Xl F Eiri + Es (11.3.4) 


i 
where ду; is a deterministic drift term that is specified at time і for all 
future dates t+ i in order to fit the time t term structure of injerest rates, 
and as before £,,; is a normally distributed shock with constant variance o?, 
Backus, Foresi, and Zin (1996) show that this model does not capture the 
conditional means or variances of future interest rates, and so it misprices 
options on bonds. Problem 11.1 works this out in detail. 

A somewhat more sophisticated procedure for fitting the term struc- 
ture of interest rates specifies future deterministic volatilities of shart rate 
movements, as well as future deterministic drifts. Black, Derman, and Toy 
(1990) do this in a lognormal model for the short rate. In the present 
model one can replace the constant variance of Ej, 07, with a determin- 
istically time-varying one-period-ahead conditional variance oĉ, Backus, 
Foresi, and Zin (1996) show that if the true model is the mean-reverting ho- 
moskedastic model, the misspecified random walk model with deterministic 
volatilities and drifts can fit any two of the current term structure, the con- 
ditional means of future short rates, and the conditional variances of future 
short rates. However it still cannot fit all three of these simultaneously, and 
so it cannot correctly price a complete set of bond options. The lesson of 
this example is that fixed-income derivative security prices depend on the 
dynamic behavior of interest rates, so it is important to model interest rate 
dynamics as accurately as possible even if one is interested only in pricing 
derivative securities today. 

À related approach that has become very popular is due to Heath, Jar- 
row, and Morton (1992), These authors start from the current forward-rate 
curve discussed in Chapter 10, and they suggest that one should specify a 
term structure of forward volatilities to determine the movements of future 
risk-adjusted forward rates. To understand this approach as simply as possi- 
ble, suppose that interest-rate risk is unpriced, so there are na risk premia 
in bond markets and the objective process for forward rates coincides with 
the risk-adjusted process. In this case bonds of all maturities must have 
the same expected instantaneous return in a continuous-time setting, and 
the same expected one-period return in a discrete-time setting. That is, the 
one-period version of the pure expectations hypothesis of the term structure 
(РЕН) holds, so from (10.2.6) of Chapter 10 we have 


E, Ira ii ш у] = —(1/2) Var, IT 441 — Jul. (11.3.5) 
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1€ expected log excess return on a bond of any maturity over the one- 

period interest rate is minus one-half the variance of the log excess return. 

Now recall the relation between an n-period-ahead 1-period log forward 

rate fn, and log bond prices, given as (10.1.8) in Chapter 10: fur = pu- 

pue This implies that the change from time ¢ to time ¿+ I in a forward 
rate for an investment to be made at time {+ n is 


| fuia — fat 


! 


(paid * fusi) (ui id pua 
Yu] uA. 1 


= (nai - yo) 2 (1. 1 * Ju). (1 1.3.6) 


Taking expectations of (11.3.6) and using (11.3.5), we find that 


1 
E, fie — fu) = (5) (Var, LI ia — уи} — Vary C Ai = yd) . 


2 

(11.3.7) 
The conditional variances of future excess bond returns determine the ex- 
pected changes in forward rates, and these expected changes together with 
the current forward-rate curve determine the forward-rate curves and yield 
curves that are expected to prevail at every date in the future, Similar prop- 
erties hold for the risk-adjusted forward-rate process even when interest-rate 
risk is priced. 

Heath, Jarrow, and Morton (1992) exploit this insight in a continuous- 
ume setting and show how it can be used to price ſixed: income derivative 
securities. lt is still important to model interest-rate dynamics accurately 
in this framework, but now the parameters of the model are expressed as 
volatilities; many participants in the markets for fixed-income securities find 
it easier to work with these parameters than with the parameters that govern 
short-rate dynamics and interest-rate risk prices in traditional models. A 
drawback of this approach, however, is that the implied process for the 
short-term interest rate is gencrally extremely complicated. 


11.3.2 Forwards and Futures 


A particularly simple kind of derivative security is a forward contract, Au 
n-period forward contract, negotiated at time ¢ on an underlying security 
with price Sn at time t+ n, specifies a price at which the security will be 
purchased at time ( n. Thus the forward price, which we write (%, is 
determined at time : but no money changes hands until time 1 .“ 

Cox, Ingersoll, and Ross (1981b) show that the forward price Ga is the 
ume (price ofa claim to a payoff of $4, / Par at time (Fn, Equivalently, Guy Ры 


“The u- period forward rate defined in Section 10.1.1 of Chapter 10 is the yield on a forward 
contract to buy a zero-coupon bond with maturity date + n + f at time (C n. 
y 1 y 
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is the price of a claim to a рауо of Spp. Intuitively, the Pu terms appear 
because no money need be paid until time / + п; thus the purchaser of a 
forward contract has the use of money between fand / n. Сох, Ingersoll, 
and Ross establish this proposition using a simple arbitrage argument. They 
consider the following investment strategy: At time 1, take a long position 
in 1/P,, forward contracts and pul С, into period bonds. By doing this 
one can purchase С/Р, bonds. The payoff from this strategy at time t+ 2 
is i " К 

p. L Gul + Ls = 72", (11.3.8) 

at at nt 

where the first term is the profit or loss on the forward contracts and the 
second term is the payoff on the bonds. Since this investment strategy costs 
Gy at time гапа pays S,,,/P, at time 1 -F n, the proposition is established. 
It can also be stated using stochastic-discountfactor notation as 


Gu = [on i Mattu 5, cul Put |, (1 1 3.9) 


where the n-period stochastic discount factor M, ока is the product of n 
successive one-period stochastic discount factors: An = Mi... My. 

A futures contract differs from a forward contract in one important re- 
spect: It is marked to market cach period during the life ofthe contract, so that 
the purchaser ofa futures contract receives the futures price increase or pays 
the futures price decrease each period. Because of these margin payments, 
futures pricing—unlike forward pricing—generally involves more than just 
the two periods 1 and £- n" IF we write the price of an a-period futures 
contract as Мы, then we have 


Н. = El Meer IIIA]. (11.3.10) 


This can be established using a similar argument to that of Cox, Ingersoll, 
and Ross. Consider the following investinent strategy: At time t, take a long 
position in 1/P4, futures contracts and put Hp into one-period bonds. Ву 


doing this one can purchase H,,/ Pj, bonds. At time ¢ + 1, liquidate the 
futures contracts. The payoff from this strategy at ime f l is 


1 I, Ila 
pz ыы Hu) + —— = ae (11.5.11) 
T и 1 


“The Treasury bond and Treasurynote futures contacts traded on the Chicago Board of 
Trade also have a number at special option (шше that ес their prices. A trader with a 
short position can choose to deliver on any day within the settlement month and can choose 
to deliver а number of alternative bonds, The short trader also has a "wild card option” to 
announce delivery at a particular day's settlement price any time in the six hours alter that 
price is determined, The discussion here abstracts from these option features; see Hull (1993, 
Chapter 4) for an introduction to them, 
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where the first term is the mark-to-market paymenton the futures contracts 
purchased at time ( and the second term is the payoff on the bonds. Be- 
cause the futures contracts are marked to market, the entire position can 
be liquidated at time Ut ] without generating any further cash flows at time 
t+ n. Since this investment strategy costs Л, at time ! and pays 111717 V. 
at time /I, we have shown that (11.3.10) holds. Furthermore, we can solve 


(11.3.10) forward to time An, using the fact that Hou, = „u, to obtain 
n-} 

lu = K. Marin Su png, (11.3.12) 
і=0 


Comparing equations (11.3.9) and (11.3.12), we can see that there are 
some circumstances where forward contracts and futures contracts written 
on the same underlying asset with the same maturity have equal prices, 
First, if bond prices are not random then absence of arbitrage requires that 


Pu = П“, 17% 80 Gu = Tay. This means that forward and futures 

prices are equal in any model with a constant interest rate, Second, if 

there is only one поа to maturit then Ри = Py and again Сы = Hy. 
| Д l 


Since futures contracts are marked to market daily the period here must 
he one day, so this result is of limited interest. Third, if the price of the 
underlying asset is not random, then forward and futures prices both equal 
the underlying asset price. Jo see this, note that if San = Via constant, 
then (11.3.9) becomes 


Gu = 1, M, 7 V/ Pul = ( V/ Pu) EM, uu] = V, (11.5.1 3) 


since Ри = ЕЛМ, „|. Under the same conditions Hh, = V, and we can 
show that 7%, = V IH, %% = V because (1 1.3.10) becomes 


11, = ТАТУР] = OVP) EI Maa] = V, (11.3.14) 


Thus Мы = V for all n, so forward and futures prices are equal. 

More generally, however, forward and futures prices may differ. In the 
case where the underlying asset isan 14. T-period zero-coupon bond at time 
4 which will be a tT-period bond at time n, we can write the forward price 
as (C and the futures price as Ih. The forward price is easy to calculate 
in this case: 


Or ut LI 6% HEN 17.717 = Peau] Pa. (11.3.15) 


Whenr = lihe yield on this forward contract is the forward rate defined 
in Section 10.1.1 of Chapter lü: P, = V/ Guay. 

The futures price must be calculated recursively from equation 
(1.3.10). Ina particular formestructure model one can do the calculation 


explicitly and solve for the velition between forward and futures prices, 
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Problem 11.2 is to do this for the homoskedastic single-factor model devel 
oped in Section 11.1.1. The problem is to show that the ratio of forward 
to futures prices is constant in that model, and that it exceeds one so that 
forward prices are always greater than futures prices. 


11.3.3 Option Pricing in a Term-Structure Model 


Suppose one wants to price a European call option written on an underlying 
security with price /.“ If the option has n periods to expiration and exercise 
price X, then its terminal payolf is Max(5,,, — X, 0). It can be priced like any 
other n-period asset using the n-period stochastic discount factor М 
Mitt... Ман Writing the option price as C,,( X), we have 


ГАРТ) = 
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In general equation (11.3.16) must be evaluated using numerical meth- 
ods, but it simplifies dramatically in one special case. Suppose that Mf. re, 
and S+, are jointly lognormal conditional on time ¢ information, with çon- 
ditional expectations of their logs Hm and u,, and conditional variancesand 
covariance of their logs Omm, Ts, and Om, All these moments may depend 
on Гапа n, but we suppress this for notational simplicity. Then we have 
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and 


* 5 Ў O uu + Om х 
БА | Se, > X] = exp (i-e) o (E a). (11.3.18) 
2 о, 
where ®(-) is the cumulative distribution function of a standard normal 
random variable, and x = log(X).!! 
Equations (11.8.17) and (11.3.18) hold for any lognormal random vari- 
ables M and Sand do not depend on any other properties of these variables. 


op р edd А AT ч 2 ; 

The notation here “Шет from the notation used in Chapter h. There P, is used for the 
underlying security price, but here we reserve P [or zerocoupon bond prices and use 5 for it 
generic security price, 


“Vy hese resulis were derived by Rubinstein (1976); see also Huang and Litzenberger (1088). 
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But we know from asset pricing theory that the underlying security price &, 
must satisly 


; Amm + Oy T 20,4 
3 = ELM, a TM = exp (2 + iy + stndi) . (1 1.3. 19) 
We also know that the price of an period zero-coupon bond, P, must 
satisfy 
Pu = EUA = exp (^. + =") . (11.3.20) 


Using (11.3.19) and (11.3.20) tosimplify (11.3.17) and (11.3.18), and substi- 
tuling into (11.3.10), we getan expression for the price ofa call option when 
the underlying security is jointly lognormal with the multiperiod stochastic 
discount factor: 


my TXT, uds ite x 
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To get the standard option pricing formula of Black and Scholes (1973), 
we need two further assumptions. First, assume that the conditional variance 
of the underlying security price a periods ahead, , is proportional 10 эк: 
Оо, = na? for some constant о”. Second, assume that the term structure 
is flat so that Pu = „ for some constant interest rate r. With these 
additional assumptions, (11.3.21) yields the Black-Scholes formula,” 
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For fixed-income derivatives, however, the extra assumptions needed to 
get the Black-Scholes formula (11.3.22) are not reasonable. Suppose ihat 
the asset on which the call option is written is a zero-coupon bond which 
currently has n+ periods to maturity. Ifthe option has exercise price X and 
n periods to expiration, the option's payoff at expiration will be Max Proin 
\ 

ог course, for any given » we can always define а? = o. Y- = ,,, ibat 


We Black-Scholes formula applies lor that à. The sumptions given are needed tor tlie Bach- 
ное s formula to apply to all à with the sane ? and от 
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X. O). The relevant bond price at expiration is the r-period bond price since 
the maturity of the bond shrinks over timc. In aterm- structure model the 
conditional volatility of the r-period bond price a periods ahead is not 
generally n times the conditional volatility o the (u 4 r - 1)-petiod bond 
price one period ahead. Also, of course, the term structive is generally not 
flat in a termestructure model. 

То get closed-form solutions for interest-rate derivatives prices we need 
a term-structure model in which bond prices and stochastic discount factors 
are conditionally lognormal at all horizons; that is, we need the homoskedas- 
lie single-factor model of Section 11.1.1 or some multifactor generalization 
of it. In the single-factor model we can use the option pricing formula 
(11.3.21) with the following inputs: 5, = Piry = exp(C- Aser = Burr x)). 
1% = exp(—A, — D, ху), and 
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= B? Var Lx, T,] (11.3.23) 


This expression for о, does not grow linearly with л. Hence if one uses the 
Black-Scholes formula (11.3.22) and calculates implied volatility, the im- 
plied volatility will depend ou the nuaturity of the option; there will be a term 
structure of implied volatility that will depend on the parameters of the under- 
Wing term- structure model. Jamshidian (1989) presents a continuous-time 
version of this result, and Turnbull and Milne (1991) derive it in discrete 
time along with numerous results for other types of derivative securities. 
Option pricing is considerably more difficult in a square-root modcl, but 
Cox, Ingersoll, and Ross (1985a) present some useful results. 

Invesunent professionals often want to price options in a way that is ex- 
actly consistent with the current term structure of interest rates. To do this, 
we can break the n-period stochastic discount factor into two components: 
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where, as in Section 113.1, the «ecompouent is stochastic while the | 
component is deterministi 


€. There is a corresponding decomposition of 


bond prices for any maturity j: D, = n, ру. "Then itis easy to show that 
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where C" is the call option price that would prevail if the stochastic discount 
factor were AI“. In other words options can be priced using the stochas- 
tic ternrstructure model, using the deterministic model only to adjust the 
exercise price and the final solution for the option price. This approach 
was first used by Ho aid Lee (1986); however as Dybvig (1989) points out, 
Ho and Lee choose as their amodel the single-factor homoskedastic model 
with ¢ = 1, which has numerous unappealing properties. Black, Derman, 
aud Toy (1990), Heath, Jarrow, and Morton (1992), and [lull and White 
(1990a) use similar approaches with different choices for the a-model, 


11.4 Conclusion 


In this chapter we have thoroughly explored a tractable class of interest- 
rate models, the so-called affine-yield models. In these models log bond 
yields are linear in state variables, which simplifies the analysis of the term 
structure of interest rates and of fixed-income derivative securities. We have 
also seen that affine-yield models have some limitations, particularly in de- 
scibing the dynamics of the short-term nominal interest rate. There is 
accordingly great interest in developing more flexible models that allow for 
such phenomena as multiple regimes, nonlinear mean-reversion, апа seri- 
ally correlated interest-rate volatility, and that fully exploit the information 
in the yield curve, 

As the termesimicture literature moves forward, it will be important to 
integrate it with the rest of the asset pricing literature. We have seen that 
term-structure models can be viewed as time-series models for the stochastic 
discount factor, The research on stock returns discussed in Chapter 8 also 
seeks to characterize the behavior of the stochastic discount factor. By com- 
bining the information in the prices of stocks and fixed-income securities 
it should be possible to gain a better understanding of the economic forces 
that determine the prices of financial assets. 


Problems—Chapter 11 


11.1 Assume that the homoskedastic lognormal bond pricing model given 
by equations (11.1.3) and (11.1.5) holds witho < 1. 
ILLI Suppose vou fit the current term structure of interest rates using 
a random walk model augmented by deterministic drift terms, equation 
(31.3.4). Derive an expression relating the drift terms to the state variable 
x and the parameters of the true bond pricing model, 


11.1.2 Compare the expected future log short rates implied by the true 
bond pricing model and the random walk model with deterministic drifts. 


| Я 
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11.1.3 Compare the time ¢ conditional variances of log bond prices at 
time 2+ | implied by the true bond pricing model and the random walk 
model with deterministic drifts. i 


| 
11.1.4 Compare the prices of bond options implied by the truel bond 
pricing model and the random walk model with deterministic drifts. 


Note: This question is based on Backus and Zin (1994). b 


11.2 Define Grn to be the price at time t of an n-period forward contract 
on a zero-coupon bond which matures at time (+ n+ r. Define H,,, to be 
the price at time t of an n- period futures contract on the same zero-coupon 
bond, Assume that the homoskedastic single-factor term-structure model 
of Section 11.1.1 holds. 


11.2.1 Show that both the log forward price grn and the log futures 
price Hr are affine in the state variable x,. Solve for the coefficients 
determining these prices as functions of the term-structure coefficients 
A, and B,. 


11.2.2 Show that the ratio of forward to futures prices is constant and 
greater than one. Give some economic intuition for this result. 


11.2.3 For the parameter values in Section 11.2.2, plot the ratio of for- 
ward prices to futures prices as a function of maturity n. 


Note: This question is based on Backus, Foresi, and Zin (1996). 
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Nonlinearities in Financial Data 


THE ECONOMETRIC METHODS we discuss in this text are almost all designed 
to detect dinear structure in financial data. In Chapter 2, for example, we 
develop time-scries tests for predictability of asset returns that use weighted 
combinations of return aulocorrelations—linear predictability is the focus. 
The event study of Chapter 4, and the CAPM and APT of Chapters 5 and 
6, are based on linear models of expected returns. And even when we 
broaden our focus in later chapters to include other economic variables such 
as consumption, dividends, and interest rates, the models remain linear. 
This emphasis on linearity should not be too surprising since many of the 
economic models that drive financial econometrics arc linear models. 

However, many aspects of economic behavior may not be linear. Exper- 
imental evidence and casual introspection suggest that investors! attitudes 
towards risk and expected return are nonlincar. The terms of many finan- 
cial contracts such as options and other derivative securities are nonlinear. 
And the strategic interactions among market participants, the process by 
which information is incorporated into security prices, and the dynamics 
of economy-wide fluctuations are all inherently nonlinear. Therefore, а 
natural frontier for financial econometrics is the modeling of nonlinear 
phenomena. 

This is quite a challenge, since the collection of nonlinear models is 
much "larger" than the collection of linear models—after all, everything 
which is not linear is nonlinear. Moreover, nonlinear models are generally 
more difficult to analyze than linear ones, rarely producing closed-form ex- 
pressions which can be easily manipulated and empirically implemented. 
In some cases, the only mode of analysis is computational, and this is unfa- 
miliar territory to those of us who are accustomed to thinking analytically, 
intuitively, and linearly. 

But economists of a new generation are creating new models and tools 
that can capture nonlinearities in economic phenomena, and some of these 
models and tools are the focus of this chapter. Exciting advances in dynam- 
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ical systems theory, nonlinear time-series analysis, stochastic-volatility mod- 
cls, nonparametric statistics, and artificial neural networks have fueled the 
recent interest in nonlinearities in financial data, and we shall explore cach 
of these topics in the following sections. 

Section 12.1 revisits some of the issues raised in Chapter 2 regarding 
predictability, but froma linearversus-nonlinear perspective. We presenta 
taxonomy of models that distinguishes between models that are nonlinear 
in mean and hence depart froin the martingale hypothesis, and models that 
are nonlinear in variance and hence depart from independence but not 
from the martingale hypothesis, 

Section 12.2 explores in greater detail models that are nonlinear in 
variance, including univariate and multivariate Generalized Autoregressive 
Conditionally Heteroskedastic (GARCI I) and stochastic-volatility models, 

In Sections 19,3 and 12.4 we move beyond parametric time-series mod- 
els to explore nonparametric methods for fitting nonlinear relationships 
between variables, including smoothing techniques and artificial neural 
networks, Although these techniques are able to uncover a variety of non- 
lincarities, they are heavily data-dependent and computationally intensive, 
To illustrate the power of these techniques, we present an application to the 
pricing and hedging of derivative securities and to estimating: state-price 
densities, 

We also discuss some of the limitations of these techniques in Scc- 
tion 12.5. The most important limitations are the win problems of overfit- 
ting and datasnooping, which plague linear models too but not nearly to 
the same degree, Unfortunately, we have very litde to say about how tc deal 
with these issues except in very special cases, hence this is an area with many 
Open researeh questions to be answered. 


12.1 Nonlinear Structure in Univariate Time Series 


A typical time-series model relates an observed time series x, toan underlying 
sequence of shocks E, In linear time-series analysis the shocks are assumed 
to be uncorrelated but are not necessarily assumed to be HD, By the Wold 
Representation Theorem any time series can be written as an infinite-order 
linear moving average of such shocks, and this linear moving-average rep- 
resentation summarizes the unconditional variance and autocovariances of 
the series, 

In nonlinear time-series analysis the underlying shocks are typically as- 
sumed to be HD, but we seek a possibly nonlinear function relating the series 
Xj to the history of the shocks. A general representation is 


M = Leng 1s €i 9. 00), (13.1.1) 


. 
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where the shocks are assumed to have mean zero and unit variance, and fo. 
is some unknown function. The generality of this representation makes it 
very hard to work with most models used in practice fall into a somewhat 
more restricted class that can be written as | 


X ge. 61-0....) + Ehli, 2. . .). (12.1.2) 


The function g(-) represents the mean of x, conditional on past information, 
since Eri [xi] = gini. €i, ..). The innovation in x is proportional to the 
shock e, where the восени of proportionality is the function А(-). The 
square of this function is the variance of x, conditional on past information, 
since E, 4 [Gg Ei р] = Alei, Ez. ...)?. Models with nonlinear gt) 
are said to be nonlinear in mean, whereas Models with nonlinear A. » are 
said to be nonlinear in variance. І 

To understand the restrictions imposed by (12.1.2) on (12.1.1), consider 
expanding (12.1.1) in a Taylor series around e, = 0 for given E＋, 6-2. . .. 


x = fO, EI. . ) +680, EI. ..) 
+ 0/262 fir(O, є...) T. (12.1.3) 


where fi is the derivative of f with respect to e;, its first argument; fi, is the 
second derivative of f with respect to H and so forth. To obtain (12.1.2), we 
drop the higher-order terms in the Taylor expansion and set g(€;-1,...) = 
S(O, 6,1...) and HI. . . .) = AO, EI. . . ). By dropping higher-order 
terms we link the time-variation in the higher conditional moments of x, 
inflexibly with the time-variation in the second conditional moment of x, 
since for all powers 522, Epi [(x, — E-i1[x]^] = AC)? Ele]. Those who 
are interested primarily in the first two conditional moments of x, regard 
this restriction asa price worth paying for the greater tractability of (12.1.2). 

Equation (12.1.2) leads to a natural division in the nonlinear time- 
series literature between models of the conditional mean g(-) and models 
of the conditional variance A(-)?. Most time-series models concentrate on 
one form of nonlinearity or the other. A simple nonlinear moving-average 
model, for example, takes the form 


x = € tae? |. (12.1.4) 
Here g(-) = ae? | and A(-) = 1. This model is nonlinear in mean but not 
in variance. The first-order Autoregressive Conditionally Heteroskedastic 


(ARCH) model of Engle (1982), on the other hand, takes the form 
Xp = €, ae? |. (12.1.5) 


Here g(-) = 0 and A.) = ac? |. This model is nonlinear in variance but 
not in mean. 
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One way to understand the distinction between nonlinearity in mean 
and nonlinearity in variance is to consider the moments of the x, process. 
As we have emphasized, nonlinear models can be constructed so that sec- 
ond moments (autocovariances) E[x,x,_;] are all zero for ig. In the 
two examples above it is easy to confirm that this is the case provided 
that e, is symmetrically distributed, i.c., its third moment is zero. For 
the nonlinear moving average (12.1.4), for example, we have E[ x; xii] = 
E[(e;toe?_,)(€-1 +a? „)] = aE[e? |] O when Ele I=. 

Now consider the behavior of higher moments of the form 


El xp xii xi, xia]. 


Models that are nonlinear in the mean allow these higher moments to be 
nonzero when i, j, &, ... O. Models that are nonlinear in variance but obey 
the martingale property have E[xi | x-1,...]=0, so their higher moments 
are zero when i, j, K.. . >0. These models can only have nonzero higher 
moments if at least one time lag index i, j. K.. . is zero. In the nonlinear- 
moving-average example, (12.1.4), the third moment with i=j=1, 


Ех, 97 1 ] = Ef (etae? Ent tae? ,)*] 


EI, ]4+20? Ele? ,] Ele? |] # 0. 


In the first-order ARCH example, (12.1.5), the same third moment Efx, x? ,] 


= E[(e, ae? ))e? ac?) = O. But for this model the fourth moment with 
i=, j=k=1, E(x? xi] = Elef a? ef zl # 0. 

(We discuss ARCH and other models of changing variance in Section 12.2; 
for the remainder of this section we concentrate on nonlinear models of 
the conditional mean. In Section 12.1.1 we explore several alternative ways 
to parametrize nonlinear models, and in Section 12.1.2 we use these para- 
metric models to motivate and explain some commonly used tests for non- 


linaarity in univariate time series, including the test of Brock, Dechert, and 
Scheinkman (1987). 


12.1.1 Some Parametric Models 


It is impossible to provide an exhaustive account of all nonlinear specifi- 
cations, even when we restrict our attention to the subset of parametric 
mogel Priestley (1988), Teräsvirta, Tjøstheim, and Granger (1994), and 
Tong (1990) provide excellent coverage of many of the most popular non- 
linear time-series models, including more-specialized models with some very 
intriguing names, e. g., self-exciting threshold autoregression (SETAR), amplitude- 
dependent exponential autoregression (EX PAR), and state-dependent models (SDM). 
To provide a sense of the breadth of this area, we discuss four examples in 
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this section: polynomial models, piccewise-lincar models, Markovswitching 
models, and deterministic chaotic models. 


Polynomial Models 


One way to represent the function g() is expand it in a Taylor series around 


€, 1 m€í gm c m0, which yields a discrete-time Volterra series (see Volterra 
11959): 
со ^w * 
gem) = ye DIT 
il 17 
^" ^ ^w 
жей. - 
1230 у оше е er ac сс. (12.1.6) 
=} у=, k=) 


The single summation in (12.1.6) is a standard linear moving average, the 
double summation captures the effects of lagged cross-products of two inno- 
vations, the triple summation captures the effects of lagged cross-products 
of three innovations, and so on. The summations indexed by 7 start at i, 
the summations indexed by £ start at j, and so on to avoid counting a given 
cross-product of innovations more than once. The idea is to represent the 
true nonlinear function of past innovations as a weighted sum of polyno- 
mial functions of the innovations. Equation (12.1.4) is a simple example of 
a mode) of this form. Robinson (1979) and Priestley (1988) make extensive 
use OF this specification. 

Polynomial models may also be written in autoregressive form. The 
function gleri €z, +) relating the conditional mean to past shocks may 
be rewritten as a function (хур. x2. ...) relating the conditional mean 
to lags of x. The autoregressive version of (12.1.60) is then 


С) X ^) 
5 А *. CN . 
goi Meee) = a, хы, t hy Xizi xi, 


t=] =| j= 


Y yo аи Н (12.1.7) 


со со 
i=l у=: k=} 

It is also possible to obtain mixed autoregressive / movingm-average repre- 
sentations, the noulincar equivalent of ARMA models. The bilinear model, 
for example, uses lagged values of xj, lagged values of ej, and cross-products 
of the two: 


X ^ 


E ^ 
g (EI. XII. ) = Do, e, L H. a „ „e (12.1.8) 


ixl tal wt =i 
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This model can capture nonlinearities parsimoniously (with a finite, short 
lag length) when pure nonlinear moving-average or nonlinear autoregres- 
sive models fail to do so. Granger and Andersen (1978) and Subba Reo and 
Gabr (1984) explore bilinear models in detail. 


Precewise-Linear Models 
Anather popular way to fit nonlinear structure is to use piecewise-linear 
functions, as in the first-order ео autoregression (TAR): 


ee ay ot Врх f 6 a < k E 
Wat fox acte; хыл > k. 


Here the intercept and slope coefficient in a regression of x, on its lag x d 
depend on the value of xj; in relation to the tAreshold k. This model can be 
generalized to higher orders and multiple thresholds, as explained in detail 
in Tong (1983, 1990). 

Piceewise-linear models also include change-point models or, as they are 
known in the economics literature, models with structural breaks, In these 
models, the parameters are assumed to shift—typically once during a fixed 
sample period, and the goal is to estimate the two sets of parameters as well as 
the change point or structural break. Perron (1989) applies this technique 
to macroeconomic time series, and Brodsky (1993) and Carlstein, Muller, 
and Siegmund (1994) present more recent methods for dealing with change 
points, including nonparametric estimators and Bayesian inference, 

Change-potat methods are very well-established in the statisties aud op- 
erations research litevature, but their application to economic models is 
not without controversy, Unlike the typical engineering application where 
a structural break is Known to exist in a given dataset, we can never say with 
certainty that a structural break exists in an economic time series. And if 
we think a structural break has occurred because of some major economic 
event, eg, a stock market crash, this data-driven specification search can 
bias our inferences dramatically towards finding breaks where none exist 
(sec, for example, Leamer (1978 | and Lo and MacKinlay [1990]). 


A IHN hing Models 

The Markoy-awitching model ot Hamilton (1989, 1990, 1993) and Sclove 
(983a, 1983b) is closely relied to the FAR. The key difference is that 
changes in regime are determined not by the level of the process, but by an 
unobserved state variable which is typically modeled as à Markov chain, For 
example, 


ay + Вуху ttu Ma zu 


* (12.1.10) 
' da + fox y tea, ifs, = 0 
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where s, is an unobservable two-state Markov chain with some transition 
probability matrix P. Note the slightly different timing convention in 
(12.1.10): s, determines the regime at time t, not I. In both regimes, x, 
is an AR(1), but the parameters (including the variance of the error term) 
differ across regimes, and the change in regime is stochastic and possibly 
sertally correlated. 

This model has obvious appeal from an economic perspective. Changes 
in regime are caused by factors other than the series we are currently mod- 
eling (s determines the regime, not xj), rarely do we know which regime 
we are in (s; is unobservable), but after the fact we can often identify which 
regime we were in with some degree of confidence (5 can be estimated, 
via Hamilton's [1989] filtering process). Moreover, the Markov-switching 
model does not suffer from some of the statistical biases that models of 
structural breaks do; the regime shifts are “identified” by the interaction be- 
tween the data and the Markov chain, not by a prion inspection of the data. 
Hamilton's (1989) application to business cycles is an excellent illustration 
of the power and scope of this technique. 


Deterministic Nonlinear Dynamical Systems 
There have been many exciting recent advances in modeling deterministic 
nonlinear dynamical systems, and these have motivated a number of techniques 
for estimating nonlinear relationships. Relatively simple systems of ordinary 
differential and difference equations have been shown to exhibit extremely 
complex dynamics. The popular term for such complexity is the Butterfly 
Effect the notion that “a flap of a butterfly's wings in Brazil sets off a tornado 
in Texas”.! This refers, only halfjokingly, to the following simple system 
of deterministic ordinary differential equations proposed by Lorenz (1963) 
for modeling weather patterns: i 

i 


х = 1009-3) (12.1.11) 
y = xz+28x-~y (12.1.12) 
8 i 
y 2 Nera (12.1.13) 


Lorenz (1963) observed that even the slightest change in the starting values 
of this system—in the fourth decimal place, for example—produces dra- 
matically different sample paths, even after only a short tine. This sensitivity 
to initial conditions is a hallmark of the emerging field of chaos theory. 

"This is adapted from the title of Edward Lorenz's address to the American Association 


for the Advancement of Science in Washington, D.C., December 1979. See Gleick (1987) for 


a lively and entertaining layman's account of the emerging science of nonlinear dynamical 
systems, or chaos theory. 
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An even simpler example ofa chaotic system is the well-known tent map: 
2x) if xii < I 
х = pi *, x € (0,1). (12.1.14) 
201-4) 1р > 3 


* 


The tent map can be viewed as a first-order threshold autoregression with 
no shock e, and with parameters o0, В,=2, 92, and Ву= — 2. M xj 
lies between 0 and 1, x, also lies in this interval; thus the tent map maps the 
unit interval back into itself as illustrated in Figure 12.1. Data generated 
by (12.1.14) appear random in that they are uuiformly distributed on the 
unit interval and are serially uncorrelated. Moreover, the data also exhibit 
sensitive dependence to initial conditions, which will be verified in Problem 
12.1. Hsieh (1991) presents several other leading examples, while Brock 
(1986), Holden (1986), and Thompson and Stewart (1986) provide more 
formal discussions of the mathematics of chaotic systems. 

Although the many important breakthroughs in nonlinear dynamical 
systems do have immediate implications for physics, biology, and other 
"hard" sciences, the impact on economics and finance has been less dra- 
matic. While a number of economic applications have been considered,” 
nonc are especially compelling, particularly from an empirical perspective. 


2See, for example, Boldrin and Woodford (1990), Brock and Sayers (1988), Craig, Kohlase, 
and Papell (1991), Day (1983), Grandinont and Malgrange (1986), Hsieh (1993), Kennan 
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There are two serious problems in modeling economic phenomena as de- 
terministic nonlinear dynamical systems. First, unlike the theory that is 
available in many natural sciences, economic theory is generally not spe- 
cific about functional forms. ‘Thus economists rarely have theoretical rea- 
sons for expecting to find one form of nonlinearity rather than another. 
Second, economists are rarely able to conduct controlled experiments, and 
this makes it almost impossible to deduce the parameters of a deterministic 
dynamical system governing economic phenomena, even if such a system ex- 
ists and is low-dimensional. When controlled experiments are feasible, e.g., 
in particle physics, it is possible to recover the dynamics with great precision 
by taking many "snapshots" of the system at closely spaced time intervals. 
This technique, known as a stroboscopic map ov a Poincaré section, has given 
empirical content to even the most abstract notions of nonlincar dynamical 
systems, but unfortunately cannot be applied to non-experimental data. 

The possibility that a relatively simple set of nonlinear deterministic 
equations can generate the kind of complexities we see in financial markets 
is tantalizing, but it is of little interest if we cannot recover these equations 
wit: any degree of precision. Moreover, the impact of statistical sampling 
errors on a system with sensitive dependence to initial conditions makes 
dynamical systems theory even less practical, Of course, given the rapid 
pace at which this field is advancing, these reservations may be much less 
serious in a few years. 


12.1.2 Univariate Tests for Nonlinear Structure 


Despite the caveats of the previous section, the mathematics of chaos theory 
has motivated several new statistical tests for independence and nonlinear 


structure which are valuable in their own right, and we now discuss these 
tests. 


Tests Based on Higher Moments 

Our earlier discussion of higher moments of nonlinear models can serve 
as the basis for a statistical test of nonlinearity. Hsieh (1989), for example, 
defines a scaled third moment: 


El x, xi- , x n 
Gp) = —r (12.1.15 
ies Ез? ) 


and observes that o, /) for all 4 у> O for HD data, or data generated by a 
martingale model that is nonlinear only in variance, He suggests estimating 


and O'Brien. (1993), Pesaran and Potter (1992), Scheinkman and LeBaron (1989), and 
Scheinkman and Woodlord (1994). 
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9G, /) in the obvious way: 


i p Xp X 1 X19 
913/2 
[735] 


Under the null hypothesis that 90, J) 0, and with sufficient regularity condi- 
tions imposed on x so that higher moments exist, VTO, j) is asymptotically 
normal and its variance can be consistently estimated by 

XP XT XP; 


4. У 
T "nr 


Hsieh’s test uses one particular third moment of the data, but it is also 
possible to look at several moments simultaneously. The autorcgressive poly- 
nomial model (12.4.7), for example, suggests thata simple test of nonlinear 


Wij) = a (12.1.16) 


о 9 9 


My in the mean is to regress x, onto its own lags and cross-products of its own 
lags, and to test for the joint significance of the nonlinear teris. ‘Tsay (1986) 
proposes a test of this sori using second-order terms and M lags for a total 
of MORTE 2 nontinear regressors. One can calculate heteroskedasticity- 
consistent standard errors so that the test becomes robust to the presence 
of nonlinearity in variance. 


The Correlation Integral and tue Correlation Dimension 

To distinguish a deterministic, chaotic process from a truly random process, 
it is essential to view the data in a sufficiently high-dimensional form. In the 
case of the tent map, for example, the data appear random if one plots x, 
on the unit interval since x, has a uniform distribution. І опе plots x, and 
Xy, ON the unit square, however, the data will atl fall on the tentshaped line 
shown in Figure 19.1. 

This straightforward approach can yield surprising insights, as we saw in 
analyzing stock price discreteness in Chapter 3. However it becomes difficult 
to implement when higher dimensions or more complicated nonlinearities 
are involved. Crassberger and Procaccia (1983) have suggested a formal 
approach to capture this basic idea, Their approach begins by organizing 
the data (prefiltered, if desired, to remove linear structure) into n-histories 
xi, defined by 

Xo episc. (12.1.18) 
The parameter a is known as the embedding dimension. 

The next step is to calculate the fraction of pairs of n-histories that are 
“close” to one another. To measure closeness, we pick a number k and call 
a patr of histories , and x close to one another if the greatest absolute 
difference between the corresponding members of the pair is smaller than 
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l: max iO . u- [Xii] < k. We define a closeness indicator Ky that is 
one if the two n-histories are close to one another and zero otherwise: n 
l if max O. -I IX Li Xil « k 


Ky = nae (12.1.19) 
0 otherwise. 


We define C, r(k) to be the fraction of pairs that are close in this sense, in 
a sample of n-histories of size T: 


pate x Ky 


Cs TC) zm T(T=1)/2 


(12.1.20) 


The correlation integral C (k) is the limit of this fraction as the sample size 
increases: 


C,(k) = dim Cu. T0 A). (12.1.21) 


Equivalently, itis the probability that a randomly selected pair of n-histories 
is close. 

Obviously the correlation integral will depend on both the embedding 
dimension n aud the parameter k. To see how k can matter, set the embed- 
ding dimension n=1 so that n-histories consist of single data points, and 
consider the case where the data are IID and uniformly distributed on the 
unit interval (0, 1). In this case the fraction of data points that are within a 
distance k of a benchmark data point is 2k when the benchmark data point 
is in the middle of the unit interval (between k and 1—4), but it is smaller 
when the benchmark data point lies near the edge of the unit interval. In 
the extreme case where the benchmark data point is zero or one, only a 
fraction k of the other data points are within æ of the benchmark. The gen- 
eral formula for the fraction of data points that are close to a benchmark 
point b is min( b, 2k, k+1—b). As k shrinks, however, the complications 
caused by this “edge problem” become negligible and the correlation inte- 
gral approaches 2k. 

Grassberger and Procaccia (1983) invesugate the behavior of the cor- 
relation integral as the distance measure k shrinks. They calculate the ratio 
of log C,(k) to log k for small k: | 

i 
1 Е (12.1.22) 

k>0 log & | 
which measures the proportional decrease in the fraction of points that are 
close to one another as we decrease the parameter that defines closeness. 
In the HD uniform case with n=], the ratio log Ci (/ / log k approaches 
log 2k/ log k=(log 2+ log k)/log k=1 as k shrinks. Thus vi- for HD uni- 
form data; for small А, the fraction of points that are close to one another 
shrinks at the same rate as k. | 
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Now consider the behavior of the correlation integral with higher em- 
bedding dimensions n. When n=2, we are plotting 2-histories of the data 
on a 2-dimensional diagram such as Figure 12.1 and asking what fraction 
of the 2-histories lic within a square whose center is a benchmark 2-history 
and whose sides are of length 2k. With uniformly distributed HD data, a 
fracjion 42 of the data points lie within such a square when the benchmark 
2-hiktory is sufficiently far away from the edges of the unit square. Again 
we handle the edge problem by letting & shrink, and we find that the ratio 
log 0 log k approaches log 4%½/ log E = (log4+2logh)/logh = 2 as 
k shrinks. Thus w=2 for HD uniform data; for small А, the fraction of pairs 
of points that are close to one another shrinks twice as fast as k. In general 
vn for HD uniform data; for small А, the fraction of n-histories that are 
close to one another shrinks z times as fast as А. 

The correlation integral behaves very differently when the data are gen- 
crated by a nonlinear deterministic process. To see this, consider data 
generated by the tent map. In one dimension, such data fall uniformly on 
the unit line so we again get vjz 1. But in two dimensions, all the data points 
fall pn the tent-shaped line shown in Figure 12.4. For small k, the fraction 
of pairs of points that are close to one another shrinks at the same rate as k 
so 9l. In higher dimensions a similar argument applies, and v, 1 for all 
n when data are generated by the tent map. 

he correlation dimension is defined to be the limit of v, as n increases, 
when this limit exists: ў 


U = lim Un. (12.1.23) 

t n= 
Nonlinear deterministic processes are characterized by finite v. 

The contrast between nonlinear deterministic data and HD uniform 
data generalizes to ПО data with other distributions, since vn for JID 
data regardless of the distribution. The effect of the distribution averages 
out because we take cach n-history in turn as a benchmark history when 
calculating the correlation integral. Thus Grassberger and Procaccia (1983) 
suggest that one can distinguish nonlinear deterministic data from IHD ran- 
dom data by calculating v, for different n and secing whether it grows with 
n or converges to some fixed limit. This approach requires large amounts 
of data since one must use very small k to calculate v, and no distribution 
theory is available for vn. 


The Brock-Dechert-Scheinkman Test 

Brock, Dechert, and Scheinkman (1987) have developed an alternative ap- 
proach that is better suited to the limited amounts of data typically available 
in economics and finance. They show that even when & is finite, il the data 
are HD then for any n 


GU) = CU)". (12.1.24) 
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To understand this result, note that the ratio C, 4007 C, GO. can be inter 
preted as a conditional probability: 


C, 1 (k) Pi i | | k | | р 
——— = Ё пах lx, — хл, < e max |х,,—х „ < 
С„(К) EUR ' ш кзн 5 Sn : 


ti 


Pr (is — KN < k | max lx. xu < i) . (12.1.25) 
ml... н 
That is, Cisi (K)/ is the probability that two data points are close, given 
that the previous 2 data points are close. H the data arc HD, this must equal 
the unconditional probability that two data points are close, Ci (4). Setting 
Cari U0/ C, UO Ci (o) for all positive n, we obtain (19.1.24). 
Brock, Dechert, and Scheinkman (1987) propose the BDS test statistic, 


C, k) ro C 1 (h " 
Juro = VT ыы гы n. (12.1.26) 
al 


where Cy РО) and C (0) are the sample correlation integrals defined in 
(12.1.20), апав, (А) is an estimator of the asymptotic standard deviation 
of C, 700) Суу The BDS statistic is asymptotically standard normal 
under the HD null hypothesis; it is applied and explained by Hsieb (1989) 
and Scheinkman and LeBaron (1989), who provide explicit expressions for 
6, rk). Hsich (1989) and Hsieh (1991) report Monte Carlo results on the 
size and power of the BDS statistic in finite samples. 

While there are some pathological nonlinear models for which C,(k)— 
C)(4)" as in HD data, the BDS statistic appears to have good power against 
the most commonly used nonlinear models. It is important to understand 
that it has power against models that are nonlinear in variance but not in 
mean, as well as models that are nonlinear in mean, Thus a BDS rejection 
does not necessarily imply that a time-series has a timc-varying conditional 
mean; it could simply be evidence for a time-varying conditional variance. 
sich (1991), for example, strongly rejects the hypothesis that common 
stock returns are HD using the BDS test. He then estimates models of the 
time-varying conditional variance of returns and gets much weaker evidence 
against the hypothesis that the residuals from such models are HD. 


12.2 Models of Changing Volatility 


In this section we consider alternative ways to model the changing volatil- 
ity of a time-series yı. Section 12.2.1 presents univariate Antoregressive 
Conditionally Heteroskedastic (ARCH) and stochastic-volatility models, and 
Section 12.2.2 shows how these may be generalized to à multivariate seting. 
Section 12.2.3 covers models in which Gime-variation in the couditional mean 
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is linked to time-variation in the conditional variance; these models are non- 
linear in both mean and variance. 

In order to concentrate on volatility, we assume that nis an innova- 
tion, that is, it has mean zero conditional on time t information. In a finance 
application, „% might be the innovation in an asset return, We define a? 
10 be the time f conditional variance of jj, | or equivalently the conditional 
expectation өгү, ү We assume that conditional on time (information, the 
innovation is normally distributed: 


mai МО, 07). (12.2.1) 


The unconditional variance of the innovation, g^, is just the unconditional 
expectation ofa? 


а? = Е | = БЕП] = 97. 


Thus variability of 0? around its mean does not change the unconditional 
variance o”. 

“The variability of o? does, however, affect higher moments of the un- 
conditional distribution of i, 1. In particular, with time-varying o the un- 


conditional distribution of en has uter tails chan a normal distribution, 
To show this, we first write: 


Met = et (12.2.2) 


where ery; isan HD random variable with zero mean and unit variance (as 
in the previous section) that is normally distributed (an assumption we did 
not make in the previous section). 

As we discussed in Chapter 1, a useful measure of tail thickness for the 
distribution of a random variable y is the normalized fourth moment, or 
kurtosis, defined hy A(y) = Еу ПЕ. It is well known that the kurtosis 
of a normal random variable is 5; hence K(e;441) = 3. But for innovations 
Mer, we have 


lo Elet] 
(Elo) 


3E[0j!] 
(ipo? p? 


R ) = 


30: [02 D? 
(Elo; p? 


iv 


3. (13.2.3) 


“This result holds only because we ate working with an innovation series that hasa constant 
(rero) conditional nean, For a series with a time-var 


ying conditional mean, the unconditional 
variance is not the same as the unconditional expectation of the conditional variance. 
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where the first equality follows from the independence ofa, and Ei, and the 1 
inequality is implied by Jensen's Inequality. Intuitively, the unconditional - 
distribution is a mixture of normal distributions, some with small variances 
that concentrate mass around the mean and some with large variances that 
put mass in the tails of the distribution. Thus the mixed distribution has 
fatter tails than the normal. 

We now consider alternative ways of modeling and estimating the a? 
process. The literature on this subject is enormous, and so our review is 
inevitably selective. Bollerslev, Chou, and Kroner (1992), Bollerslev, Engle, 
and Nelson (1994), Hamilton (1994) provide much more comprehensive 
Surveys, | 


| 
12.2.1 Univanate Models i 


Early research on time-varying volatility extracted volatility estimates from 
asset return data before specifying a parametric time-series model for volatil- 
ity. Officer (1973), for example, used a rolling standard deviation—the stan- 
dard deviation of returns measured over a subsample which moves forward 
through time—to estimate volatility at each point in time. Other researchers 
have used the difference between the high and low prices on a given day to 
estimate volatility for that day (Carman and Klass [1980], Parkinson [ 1980}). 
Such methods implicitly assume that volatility is constant over some interval 
of time. 

These methods are often quite accurate if the objective is simply to 
mcasure volatility at a point in time; as Merton (1980) observed, if an asset 
price follows a diffusion with constant volatility, e.g., a geometric Brownian 
motion, volatility can be estimated arbitrarily accurately with an arbitrarily 
short sample period if one measures prices sufficiently frequently.* Nelson 
(1992) has shown that a similar argument can be made even when volatility 
changes through time, provided that the conditional distribution of returns 
is not too fat-tailed and that volatility changes are sufficiently gradual. 

It is, however, both logically inconsistent and statistically inefficient to 
use volatility measures that are based on the assumption of constant volatility 
over some period when the resulting series moves through time. To handle 
this, more recent work specifies a parametric model for volatility first, and 
then uses the model to extract volatility estimates from the data on returns. 


ARCH Madels 


A basic observation about asset return data is that large returns (of either 
sign) tend to be followed by more large returns (of either sign). In other 


‘See Section 9.3.2 of Chapter 9. Note however that high-frequency price data are often 
severely affected by microstructure problems of the sort discussed in Chapter 3. This has 


limited the usefulness of the high-low method of Garman and Klass (1980) and Parkinson 
(1980). 
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Figure 12.2. Monthly Excess Log US Stock Returns, 1926 to 1994 


words, the volatility of asset returns appears to be serially correlated. This 
can be scen visually in Figure 12.2, which plots monthly excess returns on 
the CRSP valuc-weighted stock index over the period from 1926 to 1904. 
The individual monthly returns vary wildly, but they do so within a range 
which itself changes slowly over time. The range for returns is very wide in 
the 1930s, for example, and much narrower in the 1950s and 19605. 

An alternative way to understand this is to calculate serial correlation 
coefficients for squared excess returns or absolute excess returns. At 0.23 
and 0.21, respectively, the first-order serial correlation coefficients for these 
series are about twice as large as the first-order serial correlation coefficient 
for returns themselves, 0.11, and are highly statistically significant since the 
standard error under the null of no serial correlation is 1/VT = 0.036. 
The difference is even more dramatic in the average of the first 12 auto- 
correlation coefficients: 0.20 for squared excess returns, 0.21 for absolute 
excess returns, and 0.02 for excess returns themselves. This reflects the fact 
that the autocorrelations of squared and absolute returns die out only very 
slowly. 

To capture the serial correlation of volatility, Engle (1982) proposed 
the class of Autoregressive Conditionally Heteroskedastic, or ARCH, mod- 
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els. These write conditional variance as a distributed lag of past squared 
innovations: 


o? = wt (13, (12.2.4) 


where a(L) is а polynomial in the lag operator. To keep the conditional 
variance positive, o and the coefficients in 0.) must be nonnegative. 

As a way to model persistent movements in volatility without estimat- 
ing a very large number of coefficients in a high-order polynomial a(L), 
Bollerslev (1986) suggested the Generalized Autoregressive Conditionally 
Heteroskedastic, or GARCH, model: 


2 


Of =w 1 50 yb o (70%. (12.2.5) 


where ВО) is also a polynomial in the lag operator. By analogy with ARMA 
models, this is called a GARCH(p, 4) model when the order of the polyno- 
mial (L) is pand the order of the polynomial (1) is g. The most commonly 
used model in the GARCH class is the simple GARCH (1,1) which can be 
written as 


2 
95 


Ш 


w+ Ba? у + an? 


It 


w+ (a+ fo} toeo — 024) 
= w+ (a + В)о , tao} GI ~ 1). (12.2.6) 


In the second equality in (12.2.6), the term (02-02 j) has mean zero, con- 
ditional on time =! information, and can be thought of as the shock to 
volatility. The coefficient а measures the extent to which a volatility shock 
today feeds through into next period's volatility, while (a+) measures the 
rate at which this effect dies out over time. The third equality in (12.2.6) 
rewrites the volatility shock as an ( — D, the square of a standard normal 
less its mean—that is, a demeaned x?(1) random variable—multiplied by 
past volatility ao 

The GARCIH(1,1) model can also be written in terms of its implications 
for squared innovations "na We have 

"ha = w+ (a + B)n + O — a?) - por — аё |). (12.2.7) 

This representation makes it clear that the GARCII(I,1) model is an 
ARMA(I,1) model for squared innovations; but a standard ARMA(1,1) 
model has homoskedastic shocks, while here the shocks Qi? -оў) are them- 
selves heteroskedastic. 


Persistence and Stationarity 

In the GARCH(1,1) model it is casy to construct multiperiod forecasts of 
volatility When o -- «1, the unconditional variance обат, or equivalently 
the unconditional expectation of o, is /(—a—f). Recursively substitut- 


6 
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ing in (12.2.6), and using the law of iterated expectations, the conditional 
expectation of volatility ) periods ahead is 


Eod | = ( ms > eI) pi———— ИЗӘ) 
d 1 -о – В 1-а – В 


The multiperiod volatility forecast reverts to its unconditional mean at vate 
(a + B). This relation between single-period and multiperiod forecasts is 
the same as in a lincar ARMA(1,1) model with autoregressive coefficient 
(а + В). Multiperiod forecasts can be constructed in a similar fashion (ог 
higher-order GARCIH models, 

When a + B=1, the conditional expectation of volatility j periods ahead 
is instead 


Kilo, = of + jw. (12.2.9) 
The GARCH(L, Û) model with е В = has a unit autoregressive root so 


that today’s volatility affects forecasts of volatility into the indefinite future. 
It is therefore known as an integrated GARCH, or IGARCH 1,1), model 

“The IGARCH (1,1) process for a7 looks very much like a lincar random 
walk with drift . However Nelson (1990) shows that this analogy must be 
treated with caution, A linear random walk is nonstationary in two senses. 
First, it has no stationary distribution, hence the process is not strictly station- 
ary. Second, it has no unconditional first or second moments, hence it is 
not covariance stationary. In the IGARCH (1,0) model, on the other hand, a? 
is strictly stationary even though its stationary distribution generally lacks 
unconditional moments. Thus the IGARCIE(1,1) model is strictly stationary 
but not generally covariance stationary. 

Ii is particularly easy to show that the IGARCIT(TI) model has a sta- 
tionary distribution in the case where w=0. Here (12.2.9) simplifies to 
Flo, „If. so volatility isa martingale. At the same time, volatility remains 
bounded because it cannot go negative But the martingale convergence 
theorem states that a bounded martingale must converge; in this case, the 
only value to which it can converge is zero. The stationary distribution for 
a7 is then a degenerate distribution with point mass at zero, and this implies 
that the stationary distribution for ayy, is also degenerate at zero. In this 
case the stationary distributions for a7 and р have moments, but they are 
all trivially zero. 

When 0670, Nelson (1990) shows that there exists a nondegenerate 
stationary distribution for . But this distribution does not have a finite 
mean or higher moments. The innovation ½ f then has a stationary distri- 
bution with a zero mean, but with tails that are so thick that no second- or 
higher-order moments exist; 


^Nelson shows tli these properties hold mare generally for GARCH (LD) models with 
a+ fA > | but with E|optri 4 “7| — 0. 
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Alternative Functional Forms Sah re 
In the standard GARCH model, comet of future variance are linear in 
current and past variances and squared returns drive revisions in the fore- 
casts. An alternative model, sometimes known as the absolute value GARCH 
model, makes forecasts of future standard deviation linear in current and 
past standard deviations and has absolute values of returns driving revisions 
in the forecasts. An absolute value GARCH (1,1) model, for example, would 
be 

a = wt Pori + ао, el. (12.2.10) 


Schwert (1989) and Taylor (1986) estimate absolute value ARCH models, 
while Nelson and Foster (1994) discuss the absolute value GARCH (I, I). 

The models we have considered so far are symmetric in that negative and 
positive shocks e have the same effect on volatility. However Black (1976) 
and many others have pointed out that there appears to be an asymmetry 
in stock market data: Negative innovations to stock returns tend to increase 
volatility more than positive innovations of the same magnitude. Possible 
explanations for this asymmetry are discussed in Section 12.2.3. To handle 
this, one ean generalize the absolute value GARCH model to 


о, = wt flo, + ao, f(E), (12.2.11) 
where 
Гб) = le — b — «e, - b). (12.2.12) 


Here the shift parameter 5 and the tilt parameter c measure two different 
types of asymmetry. Û is unrestricted but we need [с] < 1 to ensure that 
S(€)29. When c=0 but 540, the effect of a shock on volatility depends on 
its distance from 5, so that volatility increases more when there is no shock 
than when there is a shock of size б. When 5—0 but c, a zero shock has 
the smallest impact on volatility but there is a distinction between positive 
and negative shocks; a shock of given size may have a larger effect when it is 
negative than when it is positive, or vice versa. Following Hentschel (1995), а 
nice way to understand (12.2.12) is to plot f (e) against e, as in Figure 12.3.8 
Panel (a) of the figure shows the absolute-value function (6=0, c=0); this is 
plotted again as a dashed line in each of не other panels. Panel (b) shows 
the shifted absolute-value function (50.5, c=0), panel (c) shows the tilted 
absolute-value function (b=0, c=0.25), id panel (d) shows a shifted and 
tilted absolute-value function (520.5, c= — 0.25). 

Hentschel (1995) further generalizes (12.2.11) to allow a power of f (e), 
rather than fte) itself, to affect volatility, and to allow a power of оү, rather 


“This is similar to Uus news impact curve” of Pagan and Schwert (1990) and Engle, and 
Ng (1993), which plots o? a; against т, holding any other relevant state variables at their uncon- 
ditional means. b 
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Figure 12.3. Shifted and Tilted Absolute-Value Function 


than o, itself, to be the variable that follows a linear difference equation, 
The resulting equation is 


A 1 


g – 1 G1 1 А v yoia 
ПЕЕ (12.2.13) 


À 


Equation (12.2.13) defines a family of models that includes most of the pop- 
ular GARCI i-type models in the literature.” The standard GARCH model 
sets A=v=2, and 6=c=0. Glosten, Jagannathan, and Runkle (1993) have 
generalized the standard GARCH model to allow nonzero c. Engle and Ng 
(1993) have instead allowed nonzero b. The absolute value GARCH model 
sets A | with free band e. Another particularly important member of the 
family (12.2.12) is the exponential GARCH or EGARCH model of Nelson 
(1990), which is obtained by setting A=0, vl, and b=0 to get 


| log(a,) = w+ Blogio) +a [161 = ce). (12.2.14) 


See also Ding, Granger and Engle (1993) for a related family of models. 
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This model is appealing because it does not require any parameter restric- 
tions 10 ensure that the conditional variance of the return is always positive. 
Also it becomes both strictly nonstationary and covariance nonstationary 
when a + f=, so it does not share the unusual statistical properties of the 
IGARCH 1,1) model. On the other haud, multiperiod forecasts of future 
variances are harder to calculate in the EGARCIU model, no closed-form 
expressions like (12.2.8) are available, 


Estimation 

We have introduced an almost bewildering variety of volatility models. To 
discover which features of these models arc important in fitting financial 
data, one must be able to estimate the models! parameters. Fortunately 
this is fairly straightforward for GARCH models and other models in the 
class defined by (12.2.13). Conditional on the parameters of the model and 
an initial variance estimate, the data are normally distributed and we can 
construct a likelihood function recursively. We write the vector of model 
parameters as 6, define о,(Ө) to be the conditional standard deviation at 
time / implied by the parameters and the history of returns, and define 
€41(8) = /e). When 8 contains the true parameters of the model, 


€i41(8) is HD with density function g(e;, 1(0)) which we have assumed to be 
standard normal: 


gea (0) = pora (12.2.15) 


S|- 


2n 
The conditional log likelihood of yyy) is therefore 
, %:) = log(g(n+1/0,(0))) — logto; (0/2 
— log(V2z ) — n?, 1/2020) 


—log(o7(0))/2, (12.2.16) 


where the last term is a Jacobian term that appears because we observe e 
and not 7,,1/0,(0). The log likelihood of the whole data set n. ., rr is 


е 
CG. . r) = , Ө). (12.2.17) 
t=} 


The maximum likelihood estimator is the choice of parameters Û that max- 
imizes (19.2.17)? 


in practice one needs an initial a? to Бери calculating the conditional likelihoods in 
1 „ B 


(12.2.16). The influence of the initial condition diminishes over time and becomes negligi- 
ble asvmptotically; thus the choice of initial condition does nor affect the consistency of the 
estimado. 
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Although itis easy to show that the maximum likelihood estimator is 
consistent, itis harder to prove that it is asymptotically normal. The difficulty 
is that this requires regularity conditions which are hard to verily for GARCH 
processes, Lee and Hansen (1994) give some results for the GARCH (l. I) 
model but few other results are available. Empirical researchers typically 
ignore this problem and assume that the usual regularity conditions hold, 
Some simulation evidence (Bollerslev and Wooldridge [1999] and Lums- 
daine |1995]) supports this practice. 

Hentschel (1995) provides maximum likelihood estimates for a great 
variety of models in the family (12.2.13) using daily and monthly stock return 
data over the period from 1926 to 1990. To estimate the parameters A and 
v with any precision, Hentschel finds that he needs the very large number 
of observations provided by daily data, These data suggest that A is close to 
one (as in the absolute value GARCH model), but that vis greater than one, 
in fact close to 1.5. In both daily and monthly data, Hentschel finds that 
asymmetry is better modeled with the shift parameter û than with the tilt 
parameter e, Thus US stock returns are well-described by a GARCH model 
for the conditional standard deviation, driven by the shifted absolute value 
of shocks raised to the power three halves, ‘The volatility process is highly 
persistent in all the models estimated, although the degree of persistence is 
sensitive to specilication in the post-World War I period. 


Additional Explanatory Variables 

Up to this point we have modeled volatility using only the past history of 
returns themselves. [is straightforward to add other explanatory variables: 
For example, one can write an augmented GARCH (TD) model as 


a; = w+ yX, + Ваг у + anî. (12.2.18) 


where N, is any variable known at time f. Provided that X, O and y Z0, this 
model still constrains volatility to be positive, Alternatively, one can add 
explanatory variables to the EGARCH model without any sign restrictions. 
Glosten, Jagannathan, and Runkle (1993) add a short-term nominal interest 
rate to various GARCH models and show that it has a significant positive 
effect on stock market volatility. 


Conditional Nonnormality 

The GARCH models we have considered imply that the distribution of re- 
turns, conditional on the past history of returns, is normal, Equivalently, 
the standardized residuals of these models, €4 O= 1/08), should be 
normal. Unfortunately, in practice there is excess kurtosis in the standard- 
ved residuals of GARCH models, albeit less than in the raw returns (see, 
for example, Bollerslev | L987] and Nelson [19901]). 
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One way to handle this problem is to continue to work with the condi- 
tional normal likelihood function defined by (12.2.16) and (12.2.17), but 
to interpret the estimator as a quasi-maximum likelihood estimator (White 
[1982]). Standard errors for parameter estimates can then be calculated 
using a robust covariance matrix estimator as discussed by Bollersleyand 
Wooldridge (1992). 

Alternatively, one can explicitly model the fat-tailed distribution of the 
shocks driving a GARCH process. Bollerslev (1987), for example, suggests 
a Student distribution with k degrees of freedom: 


-1 0 ~(h+1)/2 
gie = (rG) вези ) SHO) З 


2 2 k—2 
۰ (12.2.19) 
where FC) is the gamma function. The г distribution converges to the 
normal distribution as k increases, but has excess kurtosis; indeed its fourth 
moment is infinite when k < 4. In a similar spirit Nelson (1991) uses 
a Gencralized Error Distribution, while Engle and Gonzalez-Rivera (1991) 
estimate the error density nonparametrically, 

GARCH models can also be estimated by Generalized Method of Mo- 
ments (GMM). This is appealing when the conditional volatility o? can be 
written as a fairly simple function of observed past variables (past squared 
returns and additional variables such as interest rates). Then the model 
implies that squared returns, less the appropriate function of the observed 
variables, are orthogonal to the observed variables. GMM estimation has the 
usual attraction that one need not specify a density for shocks to returns. 


Stochastic-Volatility Models 

Another response to the nonnormality of returns conditional upon past re- 
turns is to assume that there is a random variable conditional upon which 
returns are normal, but that this variable—which we may call stochastic volatil- 
ity—is not directly observed. This kind of assumption is often made іп 
continuous-time theoretical models, where asset prices follow diffusions with 
volatility parameters that also follow diffusions. Melino and Turnbull (1990) 
and Wiggins (1987) argue that discrete-time stochastic-volatility models are 
natural approximations to such processes. If we parametrize the discrete- 
time process for stochastic volatility, we then have a filtering problem: to pro- 
cess the observed data to estimate the parameters driving stochastic volatility 
and to estimate the level of volatility at each point in time. 

A simple example of a stochastic-volatility model is the following: 


J, = eU, a, = Pari E, (12.2.20) 


where e,7-A (0, o), EVO. o, and we assume that e, and £, are serially 
uncorrelated and independent of each other. Here a, measures the dif- 
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fcrence between the conditional log standard deviation of returns and its 
mean; it follows a zero-mean AR(I) process. 

We can rewrite this system by squaring the return equation and taking 
logs to get 


log(n?) = a, +log(e?), — e, = pari +E. (12.2.21) 


This is in linear state-space form except that the first equation of (12.2.21) 
has an error with a log x? distribution instead of a normal distribution, ‘To 
appreciate the importance of the nonnormality, one need only consider the 
fact that when є, is very close to zero (an "inlier"), log(e?) is a very large 
negative outlier, 

The system can be estimated in a variety of ways. Melino and Turnbull 
(1990) and Wiggins (1987) use GMM estimators. While this is straightfor- 
ward, itis not efficient. Harvey, Ruiz, and Shephard (1994) suggest a quasi- 
maximum-likelihood estimator which ignores the nonnormality of log(e?) 
and proceeds as if both equations in (12.2.21) had normal error terms. 
More recently, Jacquier, Polson, and Rossi (1994) have suggested a Bayesian 
approach and Shephard and Kim (1994) have proposed a simulation-based 
exact maximuin-likclihood estimator. 


12.2.2 Multivariate Models 


So far we have considered only the volatility of a single asset return, More 
generally, we may have a vector of asset returns whose conditional covari- 
ance matrix evolves through timc. Suppose we have N assets with re- 
turn innovations ni1, i1... N. We stack these innovations into a- 
vQetor guam mia coi Ni!“ and define oj; Vari (ni) and о, 

м, (401. Ni): hence L. [oje] is the conditional covariance matrix of 
all the returns. It is often convenient to stack the nonredundant elemeuts 
of L- chose on and below the main diagonal—into a vector. The operator 
which performs this stacking is known as the vech operator: vech(Z,) is a 
vector with N(N+1)/2 elements. 


Multivariate GARCH Models 

Many of the ideas we have considered in a univariate context translate nat- 
ufally to the multivariate setting. The simplest generalization of the uui- 
variate GARCH (1,1) model (12.2.6) relates vech(Z,) to vech(j7],) and to 
vach(Z,_1): 


vech( L.) = w + ¥ vech(X, 4) + A vech(n,7)). (12.2.22) 


Here w isa vector with N(N+1)/2 elements, and V and A are N(N+1)/2 x 
(N+1)/2 matrices; hence the total number of parameters in this model 
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is N*(N--1)*/2 + N(N+1)/2 which grows with the fourth power of N. It 
is clear that this model becomes unmanageable very quickly; much of the 
literature on multivariate (AK CAU models therefore seeks to place plausible 
restrictions on (12.2.22) to reduce the number of parameters. Another im- 
portant goal of the literature is to find restrictions which guarantee that the 
covariance matrix 2, is positive definite. Such restrictions are comparatively 
straightforward in a univariate setting—for example, all the parameters in a 
univariate GARCH (1,1) model must be positive but are much less obvious 
in a multivariate model. 

Kroner and Ng (1993) provide a nice survey of the leading multivari- 
ate GARCII models. A first specification, the VECH model of Bollerslev, 
Engle, and Wooldridge (1988) (named after the vech operator), writes the 
covariance matrix as a set of univariate GARCH models. Each element of 
Xi follows a univariate GARCH model driven by the corresponding element 
of the cross-product matrix 7,77. The (i, j) element of E, is given by 


Oji = Wij + Bij 0, + Qij Пи Nye- (12.2.23) 


This model is obtained from (12.2.22) by making the matrices A and V diag- 
onal, The implied conditional covariance matrix is always positive definite if 
the matrices of parameters [oj], 1B, }, and [a] are all positive definite. The 
model has three parameters for each element of €, and thus SN(N--1)/2 
parameters in all. 

^ second specification, the BEKK model of Engle and Kroner (1995) 
(named alter an earlier working paper by Bollerslev, Engle, Kraft, and 
Kroner), guarantees positive deliniteness by working with quadratic forms 
rather than the individual elements of L The model is 


X, = CC WI, [В + АзирА, (12.2.24) 


where C is a lower triangular matrix with N(N -+ 1)/2 parameters, and B and 
Aare square matrices with N? parameters cach, for a total parameter count 
of (5bN?--N)/2. Weak restrictions on B and A guarantee that 27, is always 
positive definite. 

A special case ofthe BEKK model is the single-factor GARGII (1,1) model 
of Engle, Ng, and Rothschild (1990). In this model we define N-vectors А 
and w and scalars d and f, and then have 


E = CC+ AN [fw'E,. w+ awn ]. (12.2.25) 


Here C is restricted as in the previous equation. We can impose one nor- 
malizing restriction on this model; it is convenient to set L w= l, where t is 
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a vector of ones, The vector w can then be thought of as a vector of port- 
folio weights. We define he , and Opp WL. The model can now be 
restated as 


0,4, = 04 X, X. Gp, 


Ome = Om F fla ca H (12.2.96) 


The covariances of any two asset returns move through time only with the 
variance of the portfolio return, which follows a univariate GARCIU I A) 
model, Phe single-factor бАҢСИ (1,1) model isa special case of the BEKK 
model where the matrices A and B have rank one: А = Jaw) and B = 
VBwA.. M has (N74-5N-+2)/2 free parameters. The model can be extended 
straightforwardly to allow for multiple factors or a higher-order GARCH 
structure, 

Finally, Bollerslev (1990) has proposed a constant-correlation mode! in 
which each asset return variance follows a univariate САКСН (1,1) model 
and the covariance between any two assets is given by a coustant-correlation 
coefficient multiplying the conditional standard deviations of the returns: 


2 
Ou, = Wut B An n, 


Gi, = % /, (12.2.27) 


This model has N(N--5)/2 parameters. It gives а positive definite covariance 
matrix provided that the correlations р, make up a well-defined correlation 
matrix and the parameters p, o, and Ba are all positive. 

To understand the differences between these models, it is instructive 
to consider what happens to the conditional covariance between two asset 
returns after large shocks of opposite signs hit the two assets. In the VECH 
model with a positive o, coefficient, the negative cross-product у lowers 
the conditional covariance. Inthe constant correlation model, on the other 
hand, the sign of the cross-product yanis irrelevant; any event that increases 
the variances of two positively correlated assets raises the covariance between 
them, In the factor ARCH model o, only moves with Opp SO the effect of 
à negative cross-product 7,74, depends on the weights in portfolio p. 

As in the univariate case, return volatilities may be persistent in multi- 
variate GARCI models. Multivariate models allow for the possibility that 
some asset volatilities may share common persistent components; for ex- 
ample, there might he one persistent component in а set of volatilities, so 
chat all changes in one volatility relative to another are transitory, Bollerslev 
and Engle (1993) explore this idea, which is analogous to the concept of 
cointegration in the literature on linear unitroot processes, 
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Multivariate Stochastic-Volatility Models 


The univariate stochastic-volatility model given in (12. 230) i is ¢ also easily 
extended to a multivariate setting. We have 


ПУА 


n, = ce, a, = Фа, + ,, (12.2.28) 


where т}, €, &, and €, are now (N x I) vectors and Ф is an (Nx N) matrix. 
This model has № parameters in the matrix Ф, N(N4-1)/2 parameters in 
the covariance matrix of ej, and N(ON-F1)/2 parameters in the covariance 
matrix of u, so the total number of parameters is N(2N+1). There is no 
need to restrict a, to be positive and it is straightforward to estimate the €, 
and 7, covariance parameters in square-root form to ensure that the implied 
covariance matrix is positive definite. Harvey, Ruiz, and Shephard (1994) 
suggest restricted versions of this model in which Ф is diagonal (reducing the 
number of parameters to N(N+2)) or is even the identity matrix (further 
reducing the number of parameters to N(N+1)). 

Even without such extra restrictions, it is important to understand that 
the specification (12.2.28) imposes constant conditional correlations of asset 
returns, In this respect it is as restrictive as Bollerslev's (1990) constant- 
correlation GARCH model, and it has more parameters than that model 
whenever N23, 


i 
| 
A Conditional Market Model ! 
Even the most restrictive of the models we have discussed so far are hard to 
apply to a large cross-sectional data set because the number of théir param- 
eters grows with the square of the number of assets N. The problem is that 
these models take the whole conditional covariance matrix of returns as the 
object to be studied. An alternative approach, paralleling the much earlier 
development of static mean-variauce analysis, is to work with a conditional 
market model. Continuing to ignore nonzero mean returns, we write \ 


Vae = Валк + iiti (12.2.29) 


where fu = e is the conditional beta of asset i with the market, and 
C, ja Isam idiosyncratic shock which is assumed to be uncorrelated across as- 
sets, Within this framework we might model %, the conditional variance 
of the market return as a univariate GARCH(1,1) process; we might model 
Bing OF equivalently Gim as depending on Omm. Pim s, and the returns Ni 
and hu and we might model the couditional variance of the idiosyncratic 
shock to return as another univariate GARCH (1,1) process. The covariance 
matrix implied by a model of this sort is guaranteed to be positive definite, 
and the number of parameters in the model grows at rate N rather than N*, 
which makes the model applicable to much larger numbers ol assets. Braun, 
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Nelson, and Sunier (1995) take this approach, using EGARCH functional 
forins for the individual components of the modcl. 


12.2.3 Links between First and Second Moments 


We have revicwed some extremely sophisticated models of time-varying sec- 
ond moments in time series whose first moments are assumed to be constant 
and zero. But the essence of finance theory is that it relates the first and 
second moments of asset returns. Accordingly we now discuss models in 
which conditional mean returns may change with the conditional variances 
and covariances. 


The GARCH-M Model 

Engle, Lilien, and Robins (1987) suggest adding a time-varying intercept to 
the basic univariate model (12.2.2). Writing 441 for a continuously com- 
pounded asset return which is the time series of interest (since we no longer 
work with a incan-zcro innovation), we have 


| Net = Met о, шщ = yo + iod, (12.2.30) 


| where є is an HD random variable as before, and o? can follow any GARCI 
| process. This GARCH-in-mean or GARCH-M model makes the conditional 
i mean of the return linear in the conditional variance. It can be straightfor- 
| wardly estimated by maximum likelihood, although it is not known whether 


the model satisfies the regularity conditions for asymptotic normality of the 
maximum likelihood estimator. 


The GARCH-M model can also be specified so that the conditional mean 
is linear in the conditional standard deviation rather than the conditional 
variance. It has been generalized to a multivariate setting by Bollerslev, 
Engle, and Wooldridge (1988) and others, but the number of parameters 
increases rapidly with the number of returns and the model is typically 
applied to only a few assets. É 


The Instrumental Variables Approach 

As an alternative to the GARCII-M model, Campbell (1987) and Harvey 
(1989, 1991) have suggested that one can estimate the parameters linking 
, first and second moments by GMM. These authors start with a model for 
the "market" return that makes the expected market return linear in its 
own variance, conditional on some vector Н, containing J. instruments or 
forecasting variables: 


Ebr iH. = yo + yi Varbrs iHi]. (12.2.31) 
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Campbell and Harvey assume that conditional expected returns are linear 
in the instruments and deline errors 


ttl S 1. II. by. 
Pm dtl = In. I T Yo Yi (7 M 71 x Hb, )*. (12.2.32) 


Here b,, is a vector of regression cocfficients of the market return on the 
instruments. The error 1,444 is the difference between the market return 
and a linear combination of the instruments, while the error ej 444 is the 
difference between the market return and a linear function of oie The 
model (12.2.31) implies that the errors ишн and e are both orthogo- 
nal to the instruments H,. With J. instruments, there are 2L orthogonality 
conditions available to estimate 14-2 parameters (yo, yj, and the L coctfi- 
cients in b,). Thus GMM delivers both parameter estimates and a test for 
the overideutifying restrictions of the model, 

This approach can easily be generalized to include other assets whose 
expected returns are given by 


К тие | Hj] = yo d yi Cov[ tairi љан | Hi. (12.2.33) 
If there are N such assets, we define a vector rji x bape... yanap. 
The conditional expectation of rr, | is given by г II. = НВ, where 


В is à matrix with NL coefficients. We define errors 


Uni = грр H,B, 


I 


е 


rii — Yo vr H. B) НБ), (12.2.34) 


and we get 2NL extra orthogonality conditions to identify NL + 1 extra 
parameters, The total number of orthogonality conditions in (12.2.33) and 
(12.2.34) is 2(N -+ 1) L and the total number of parameters is N(L + 1) + 
J. 4-2. Thus the model is identified whenever two or more instruments are 
available, 

Harvey (1989) further generalizes the model to allow for a time-varying 
price of risk, He replaces (12.2.33) by 


EIN. * | H] = Yo + Yu Соу ты ma i Hil. (12.2.35) 


where yy varies through time but is common to all assets. Since (12.2.35) 
holds for the market portfolio itself, 


KI», H,) — : 
ee с. бк ы; (12.2.36) 
Var! III. TI | Нн 
and Harvey uses this to estimate the model. Пе substitutes (12.2.36) into 
(12.2.35), multiplies through by Var[r,;4à H,, and uses ELI! = 
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Hob, and Efn, ! EIL] = PLB to construct a new error vector 


Vna ^ uut c ILB, (HB - yot) 


Gua < ВУР ор Arb, CB b, - y). (12.2.37) 


Harvey replaces ei, tin (12.2.34) with vi, 4m (12.2.37) and drops the error 
Cart in (12.2.32). This gives a system with Z fewer orthogonality conditions 
and one less parameter to estimate (since yi drops out of the model). The 
number of overidentilying restrictions declines by J. — f. Harvey (1989) 
finds some evidence that the price of risk varies when a US stock index is 
used as the market portfolio; however he also rejects the overidentifving 
restrictions of the model. Harvey (1991) uses a world stock index as the 
market portfolio and obtains similar results, 


The Conditional CAPM and the Unconditional CAPM 


Equations (12.2.35) and (12.2.36) can be rewritten as 

Vna | It] = yo + By dy. (199,38) 
where Ba ou Corrir ran | HHY Val rma | Hel, the conditional 
beta of asset i with the marketretum, and A, z Pry | Hi] , the 


expected excess etum on the market over a riskless return. 

Jagannathan and Wang (1996) emphasize that this conditional version 
of the CAPM need not imply the unconditional CAPM that was discussed in 
Chapter 5, U we take unconditional expectations of (12.2.38), we get 


Кз] = yo (EIB РОТА р + CO, X/). (12.2.39) 


Here ЕА is the unconditional expected excess return on the market. 
. H,] is the unconditional expectation of the conditional beta, which need 
not be the same as the unconditional beta, although the difference is likely 
10 be small. Most important, the covariance between the conditional beta 
and the expected excess market return A, appears in (12.2.39). Assets whose 
betas are high when the market risk premium is high will have higher un- 
conditional mean returns than would be predicted by the unconditional 
CAPM, Jagannathan and Wang (1996) argue that the high average returns 
ou small stacks might be explained by this effect if sumallstock betas tend to 
rise at times when the expected excess return on the stock market is high. 
They present some indirect evidence for this story although they do not 
directly model the timesvariation of small-stock betas, 


Volatility Innovations and Return Innovations 

Empirical researchers have found Hule evidence that periods ot high volatit- 
uy in stock returns are periods of high expected stock returns. Some pa- 
pers report weak evidence for this relationship (see Bollerslev, Engle, and 
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Wooldridge [1988], French, Schwert, and Stambaugh [1987], and Harvey 
(1989]), but other papers which use the short-term nominal interest rate as 
an instrument find a negative relationship between the mean and volatility 
of returns (see Campbell [1987] and Glosten, Jagannathan, and Runkle 
[1993]). 

As French, Schwert, and Stambaugh (1987) emphasize, there is much 
stronger evidence that positive innovations to volatility are correlated with 
negative innovations to returns. We have already discussed how asymmetric 
GARCH models can fit this correlation. At a deeper level, it can be explained 
in one of two ways. One possibility is that negative shocks to returns drive 
up volatility. The leverage hypothesis, due originally to Black (1976), says that 
when the total value of a levered firm falls, the value of its equity becomes 
a smaller share of the total. Since equity bears the full risk of the firm, the 
percentage volatility of equity should rise. Even if a firm is not financially 
levered with debt, this may occur if the firm has fixed commitments to 
workers or suppliers. Although there is surely some truth to this story, it is 
hard to account for the magnitude of the return-volatility correlation using 
realistic leverage estimates (see Christie (1982] and Schwert (1989]). 

An alternative explanation is that causality runs the other way: Positive 
shocks to volatility drive down returns. Campbell and Hentschel (1992) call 
this the volatility-feedback hypothesis. If expected stock returns increase when 
volatility increases, and if expected dividends are unchanged, then stock 
prices should fall when volatility increases. Campbell and Hentschel build 


this into a formal model by using the loglinear approximation for returns 
(7.2.26): 


mı = Erm] + Ta, tel 7 etel (12.2.40) 
where 
оо . oo я 
Nate) = Eng Р pads] — Е, M 
j=0 j=0 


is the change in expectations of future dividends in (7.2.25), and 


оо i oo . 
De] = Ei [> Pinas ~E, p 22 


j=l j=l 


is the change in expectations of future returns. 

Campbell and Hentschel model the dividend news variable паі as 
a GARCHI(1,1) process with a zero mean: 74,417 (0,02), where o? = 
w+ Ba? Ta, They model the expected return as linear in the variance 


“In fact they use a more general asymmetric model, the quadratic GARCH ог QCARCH 
model of Sentana (1991). This is to allow the model to fit asymmetry in returns even in 
the absence of volatility feedback. However the basic idea is more simply illustrated using a 
standard GARCH model. 
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of dividend news: Е.) = yo +107. These assumptions imply that the 
revision in expectations of all future returns is a multiple of today’s volatility 

2 24. 

{ shock (jj 4, — ad): 


^о OO 


Eni on — Е, Ye tats, 


y=} у= 
2 


= Omin 0), (12.2.41) 


il 


Til 


where 9=y; pa /(1— pla -+ 8)). The coefficient 0 is large when y, is large 
(for then expected returns move strongly with volatility), when a is large 
(for then shocks feed strongly into future volatility), and when a + f is 
large (for then volatility shocks have persistent effects on expected returns). 
| Substituting into (12.2.40), the implied process for returns is 


net = Yo H NOÈ + naa — POP ay, — 02). (19.9.42) 


1 
This is not a GARCH process, but a quadratic function ofa GARCH process. 
\Itimplies that returns are negatively skewed because a large negative realiza- 
tion of H will be amplified by the quadratic term whereas a large positive 
realization will be damped by the quadratic term. The intuition is that any 
large shock of either sign raises expected future volatility and required re- 
turns, driving down the stock return today. Conversely, "no news is good 
news”; if 4,4470 this lowers expected future volatility and raises the stock 
return today. Campbell and Hentschel find much stronger evidence for à 
positive price of risk y when they estimate the model (12.2.42) than when 
they simply estimate a standard GARCH-M model. Their results suggest that 
both the volatility feedback effect and the leverage effect contribute to the 
asymmetric behavior of stock market volatility. 


12.3 Nonparametric Estimation 


In some financial applications we may be led to a functional relation between 
two variables Y and X without the benefitofa structural model to restrict the 
parametric form of the relation. In these situations, we can use nonparamel- 
ric estimation techniques to capture a wide varicty of nonlinearities without 
recourse to any one particular specification of the nonlinear relation. In 
contrast to the relatively highly structured or parametric approach to esti- 
mating nonlinearities described in Sections 12.1 and 12.2, nonparametric 
estimation requires few assumptions about the nature of the noulinearities. 
However, this is not without cost—nonparametric estimation is highly data- 
intensive and is generally not effective for smaller sample sizes. Moreover, 
nonparametric estimation is especially prone to overfitting, a problem that 
cannot be casily overcome by statistical methods (see Section 12.5 below). 
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Perhaps the most commonly used nonparametric estimators are smooth- 
ing estimators, in which observational errors are reduced by averaging the 
data in sophisticated ways. Kernel regression, orthogonal series expansion, 
projection pursuit, nearest-neighbor estimators, average derivative estima- 
tors, splines, and artificial neural networks are all examples of smoothing. 
To understand the motivation for such averaging, suppose that we wish to 
estimate the relation between two variables Y, and X, which satisfy 


Y, = mX) ten EES usd. (12.3.1) 


where mt) is an arbitrary fixed but unknown nonlincar function and {є,] is 
a zero-mean IID process. 

Consider estimating m) at a particular date % for which Xv, and 
suppose that for this one observation Xu, we can obtain repeated indepen- 
dent observations of the variable V, say үу, eos % Syn, Then a natural 
estimator of the function m(-) at the point xp is 


. 1 n 1 n А 
Hx) = = oi = = Уто) tei] (12.3.2) 
il ixi 
1 n 
= +-) є. 12.3.3 
m(x) А NI ( ) 


and by the Law of Large Numbers, the second term in (12.3.3) becomes 
negligible for large n. 

Of course, iE [Y] is a time series, we do not have the luxury of repeated 
observations for a given X/. However, if we assume that the function m. is 
sufficiently smooth, then for time-serics observations X, uear the value x, 
the corresponding values of Y, should be close to m(x). In other words, if 
m(-) is sufficiently smooth, then in a small neighborhood around x, m(x) 
will be nearly constant and may be estimated by taking an average of the s 
that correspond to those X/'s near x. The closer the X's are to the value 
x, the closer an average of corresponding 's will be to m(x). This argues 
for a weighted average of the Y/'s, where the weights decline as the Ху get 
farther away from W. This weighted average procedure of estimating m(x) 
is the essence of smoothing. More formally, for any arbitrary x, a smoothing 
estimator of m(x) may be expressed as 

Dx 
nx) = = Do, roxy, (12.3.4) 
Г 
where the weights {шу 7(x)) are large for those Yrs paired with X's near x, 
and small for those Y;'s with Ху far from x. 

Toimplementsuch a procedure, we must define what we mean by “near” 

and "far", If we choose too large a neighborhood around x to compute the 


эө 12. Nonlinearities in Financial Data 


average, the weighted average will be too smooth and will not exhibit the 
genuine nonlinearities of эң). If we choose too small a neighborhood 
around x, the weighted average will be too variable, reflecting noise as well 
as the variations in nt). Therefore, the weights (o, r(x)] must be chosen 
carefully to balance these two considerations, We shall address this and 
other related issues explicitly in Sections 12.3.1 to 12.3.3 and Section 19.5. 


12.3.1 Kernel Regression 


An important smoothing technique for estimating mt) is kernel regression, 
In the kernel repression model, the weight function ш, 7(x) is constructed 
from a probability density function K(x), also called a kernel: 


K(x) > 0, | ots = |, (12.3.5) 


Despite the fact that KEN) is a probability density function, it plays no prob- 
abilistic role in the subsequent analysis—it is merely a convenient method 
for computing a weighted average, and does not imply, for example, that X 
is distributed according to K(x) (which would be a parametric assumption). 

By rescaling the kernel with respect to a variable 4-0, we can change 
its spread by varying rif we define: 


1 
Kyl) = 7K . О zm]. (12.3.0) 
л 


Now we can define the weight function to be used in the weighted average 
(12.3.4) as 


wm ray = K/(x — XY) / n(x) (12.3.7) 
LE 

6% = кў Kale = X). (12.3.8) 
121 


лах very small, the averaging will be done with respect to a rather small 
neighborhood around cach of the Ах. If h is very large, the averaging will 
be over larger neighborhoods of the X's. Therefore, controlling the degree 
of averaging amounts to adjusting the smoothing parameter А, also known 
as the bandwidth?" Substituting (12.3.8) into (12.3.4) yields the Nadaraya- 
Watson kernel estimator pyta) of mtx): 


1 


А HA T Kale X) Y 
Hx) = — 2 (QUY, = Pas Rae AY. (12.3.9) 
! tal 2123 Kx — X) 


V Choosing the appropriate bandwidth is discussed more fully in Section 12.3.2. 
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Figure 12.4. Simulation of Y, = Sin(X,) + 0.5e, 


Under certain regularity conditions on the shape of the kernel K and the 
magnitudes and behavior of the weights as the sample size grows, it may 
be shown that m(x) converges to m(x) asymptotically in several ways (see 
Hardle [1990] for further details). This convergence property holds for 
à wide class of kernels, but for the remainder of this chapter and in our 
empirical examples we shall use the most popular choice of kernel, the 
Gaussian kernel: 


1 25 


K,(x) = e 27. 12.3.10 
a(x) m ( ) 


An Illustration of Kernel Regression 

To illustrate the power of kernel regression in capturing nonlinear relations, 
we apply this smoothing technique to an artificial dataset constructed by 
Monte Carlo simulation. Denote by {X,} a sequence of 500 observations 
which take on values between 0 and 27 at evenly spaced increments, and let 
[Y] be related to {X,} through the following nonlinear relation: 


Y, = Sin(X) + 0.5e, (12.8.11) 


where {e} is a sequence of HD pseudorandom standard nornial variates. 
Using the simulated data (Ху, V] (see Figure 12.4), we shall attempt to es- 
timate the conditional expectation E[Y, | Xj] = Sin(X,), using kernel 
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regression. To do this, we apply the Nadaraya-Watson estimator (12.3.9) 
yth a Gaussian kernel to the data, and vary the bandwidth parameter А 
between 0.16, and 0.56, where à, is the sample standard deviation of (ХІ). 
у varying hin units of standard deviation, we are implicitly normalizing the 
explanatory variable X, by its own standard deviation, as (12.3.10) suggests. 
| For each value of Л, we plot the kernel estimator as a function of X,, and 
these plots are given in Figures 12.5a to 12.5c. Observe that for a bandwidth 
of 0.16,, the kernel estimator is too choppy—the bandwidth is too small 
tû provide sufficient local averaging to recover Sin(X;). While the kernel 
estimator docs pick up the cyclical nature of the data, it is also picking up 
rdndom variations due to noise, which may be eliminated by increasing the 
bandwidth and consequently widening the range of local averaging. 
Figure 12.5b shows the kernel estimator for a larger bandwidth of 0.36,, 
which is much smoother and a closer fit to the wue conditional expectation. 
As the bandwidth is increased, the local averaging is performed over 
successively wider ranges, and the variability of the kernel estimator (as 
a function of x) is reduced. Figure 12.5c plots the kernel estimator with a 
bandwidth of 0.50,, which is too smooth since some of the genuine variation 
of the sine function has been eliminated aloug with the noise. In the limit, 
the kernel estimator approaches the sample average of {Y,}, and all the 
variability of Y, as a function of X, is lost. 


12.3.2 Optimal Bandwidth Selection 


{cis apparent from the example in Section 12.3.1 that choosing the proper 
bandwidth is critical in any application of kernel regression. There are 
several metliods for selecting an optimal bandwidth; the most common of 
these is the method of cross-validation, popular because of its robustness and 
asymptotic optimality (see Hardle [1990, Chapter 5] for further details). 
In this approach, the bandwidth is chosen to minimize a weighted-average 
squared error of the kernel estimator. In particular, for a sample of T 
observations (X;, КЕ let 


И 1 
51 % = . Lo, (12.3.12) 
4 
which is simply the kernel estimator based on the dataset with observation 
j deleted, evaluated at the jth observation Aj Then the cross-validation 
function CV(A) is defined as 
* 
CV) = 7 ие m, (X) Jꝛ8 (X.). (12.3.13) 
where (Ху is a nonnegative weight function that is required to reduce 
boundary effects (sec Hardle (1990, p. 162] for further discussion), The 
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Figure 12.5. Kernel Estimation 
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function CVU) ts called the cross-validation function because it validates 
the success of the kernel estimator in fitting {Yr} across the F subsamples 
(X.. Phap cach with one observation omitted. The optimal bandwidth is 
the one that minimizes this function. 


12. 5. 3 Average Derivative Estimators 


For many financial applications, we wish to relate. Y, to several variables 
XII. . . Na nonparametrically,. For example, we may wish to model the 
expected. returns of stocks and bonds аз a nonlinear function of several 
factors: the market return, interest rate spreads, dividend yield, ete. (sec Lo 
and Mackinlay [1996]). Such a task is considerably more ambitious than the 
univariate example of Section 12.3.1. To see why, consider the case of five 
independent variables and, without loss of generality, let these five variables 
all take on values in the interval [0, fj. Even if we divide the domain of 
each variable into only ten equally spaced pieces, this would yield a total 
of 10°= 100,000 neighborhoods each of width 0.10; hence we would need 
at least 100,000 observations to ensure an average of just one data point 
per neighborhood! This curse of dimensionality can only be solved by placing 
restrictions ou the Kinds of nonlinearities tat are allowable, 

For example, suppose a near combination of the X,’s is related t0 Y, 
uonparametrically “This has the advantage of capturing importaut non- 
lincarities while providing sufficient structure to permit estimation with 
reasonable sample sizes, Specifically, consider the following multivariate 
nonlinear model: 


Y, = mX P) e. E[eX,] = 0 (12.3.14) 


where X, = (Xj... Nul is now a (k x D) vector and mE) is some arbitrary 
but fixed nonlinear function. The function m) may be estimated by the 
following twostep procedure: (1) estimate with an average derivative 
estimator В; and (2) estimate не) with a kernel regression of Y; on X.. 

Stoker (1980) observes that the coefficients of (12.3.14) may be esti- 
mated up toa scale factor by ordinary least squares if either of the following 
two conditions is (rue: (1) the Xo s are multivariate normal vectors; or, more 
generally, (2) EEN, | XJ] is linear in XB fori = 1. . . kei! OM neither 
of these conditions holds, Stoker (1986) proposes an ingenious estimator, 
the average derivative estimator, which can estimate B consistently (see also 
Stoker [1902]). 


VU his second condition is satistied by multivariate normal X;^ but is also satisfied far non- 
nos al ері АЙ svete disnibutions. See Chamberlain (983b), Chung and Goldberger 
im D, Deaton and Irish (19081), and Ruud (1983), 
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of the derivative of m(-) with respect to the X, is proportional to В: 


E Fa = Elm (XB & f. (12.3.15) · 
OX, 
Therefore, an estimator of the average derivative is equivalent to an esti- 
mator of {3 up to a scale factor, and this scale factor is irrelevant for our 
purposes since it may be subsumed by m(-) and consistently estimated by 
kernel regression. 

There are several average derivative estimators available: the direct, in- 
direct, and slope estimators. Stoker (1991, Theorem 1) shows that they are 
all asymptotically equivalent; however, Stoker (1992, Chapter 3) favors the 
indirect slope estimator (ISE) for two reasons. First, if the relation between Y, 
and X, is truly linear, the indirect slope estimator is still unbiased whereas 
the others are not. Second, the indirect slope estimator requires less, pre- 
cision from its nonparametric component estimators because of the [SE's 
ratio form (see below). . | 

Heuristically, the indirect slope estimator His exploits the fact that the 
unknown parameter vector Û is proportional to the covariance between 
the dependent variable Y and the negative of the derivative of the loga- 
rithm of the marginal density of independent variables X,, denoted by ((-). 
Therefore, by estimating Cov[Y, 100], we obtain a consistent estimator of 
В up to scale, This covariance may be estimated by computing the sample 
covariance between Y and the sample counterpart to ((.). 

More formally, Bir. may be viewed as an instrumental variables TV) 


estimator (see Section A.I of the Appendix) of the regression of Y, on X, 
with the instrument matrix H: 


Bis: = (HX)!HY, (12.3.16) 
where Y = (Yi... У], 
Jo ol < 
H= I LXXX |. Xe] 1 XK |, (12.3.17) 
ID І XI. 


100 is an estimator of the negative of the derivative of the log of the marginal 
density of X, and 1,(х) is an indicator function that trims а porüon of the 
sample with estimated marginal densities lower than a fixed constant б: 


L(x) = 1/0 > b]. (19.3.18) 
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In most empirical applications, the constant | is set so that between 1% and 
5% of the sample is trimmed, 

To obtain ic), observe that if f(x) denotes the marginal density of X,, 
then the Gaussian kernel estimator of f(x) is given by!” 


n 1 1 x—X, 
= =y K 2.3.19 
fo TT N ( s h (12.3.19) 
where . 
x—X, E Xi — Xu 
K = 1 12.3. 
(55) = fe (8) (12.20 


4 1 
= (uy exp|--—(x-X)(x-X)|. (12.321) 
2h? 


Therefore, we have 


' fw - (12.3.22) 
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anll we can deſine I(x) to be 
i 


| f(x) 


I(x) = ,سسا‎ 
[o9 


(12.3.24) 


Despite the multivariate nature of /(-), observe that there is still only 
a single bandwidth to adjust in the kernel estimator (12.3.19), As in the 
unjvariate case, the bandwidth controls the degree of local averaging, but 
now over multidimensional neighborhoods. Asa practical matter, the nu- 
merical properties of this local averaging procedure may be improved by 
nofmalizing all the X;/'s by their own standard deviations before computing 
fc and then multüplying cach of the Hi's by the standard deviation of the 
corresponding X, to undo the normalization. 

i 


| 


"Note that the bandwidth A implicit in Је) is, in general, different from the bandwidth of 
the nonparametric estimator of md.) in (12.3.14). Cross-validation techniques may be used to 
select both; however, this may be computationally too demanding and simple rules-<of-thumb 
may suffice. 
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12.3.4 Application: Estimating State-Price Densities 


One of the most important theoretical advances tn the economies of invest 
ment under uncertainty is the timestate preference model of Arrow (1964) 
and Debreu (1959) in which they introduce primitive securities, cach pay- 
ing $1 in one specific state of nature and nothing otherwise. Now known 
as Arrow-Debreu securities, they ave the fundamental building blocks (rom 
which we have derived much of our current understanding of economic 
equilibrium in an uncertain environment. 

In practice, since true Arrow-Debreu securities are not yet traded on any 
organized exchange, Arrow-Debreu prices are not observable." However, 
using Nonparametric techniques—specifically, multivariate kernel regres- 
sion—AitSahalia and Lo (1996) develop estimators for such prices, known 
as a stale-price density (SPD) in the continuous-state case. The SPD contains a 
wealth of information concerning the pricing (and hedging) of risky assets 
in an economy. In principle, it can be used to price other assets, even assets 
that are currently not traded (see Ait-Sahalia and Lo [1995] for examples). 

More importantly, SPDs contain much formation about preferences and 
asset price dynamics, For example, if parametric restrictions are imposed 
on the data-gencrating process of asset prices, the SPD estimator may be 
used to infer the preferences of the representative agent in an equilibrium 
model of asset prices (see, for example, Bick [1990] and He and Leland 
[1993)). Alternatively, if specific preferences are imposed, the SPD estima- 
tor may be used to iufer the data-generating process of asset prices (sec, 
for example, Derman and Kani [1994], Dupire [1994], Jackwerth and Ru- 
binstein [1995], LongstafT [1992, 1994], Rady [1994], Rubinstein [1985], 
and Shimko [1991, 1993]). Indeed, Rubinstein (1985) has observed that 
any two of the following implies the third: (1) the representative agent's 
preferences; (2) asset price dynamics; and (3) the SPD. 


Definition of the State-Price Density 

To define the SPD formally, consider a standard dynamic exchange economy 
(see Chapter 8) in which the equilibrium price p; of a security at date ( with 
a single liquidating payoff Y(Cy) at date T that is a function of aggregate 
consumption Cr is given by: 


8^" (Сү) 


„ «X Vers PT 4, 7 
P, = EQY(COMo т). — Mo 179 


(12.3.25) 


This may soon Change with the advent ol supershaies, Hist proposed by Garman (1978) 
and Hakansson (1976, 1977) and currently under development by Leland O'Brien Rubinstein 
Associates, Inc. See Mason, Merton, Perold, and Tutaio (1995) for further details, 

V Ot course, markets must be dynamically complete tor such prices ta be meaningful —see, 
tor example, Constantinides (1982), This assumption is almost always adopted, either explicitly 
or implicitly, in parametric derivative pricing models, and we adopt ıt as sel. 
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where M, y is the marginal rate of substitution between dates Cand T, aud 
ô is the rate of time preference. This well-known equilibrium asset-pricing 
relation equates current price of the security to its expected discounted 
future payoff, discounted using the stochastic discount factor. 

Lucas (1978) observes that (12.3.25) need not imply a martingale pro- 
cess for [Ph supporting Levov's (1973) contention that the martingale prop- 
erty is neither a necessary nor sufficient condition for rationally determined 
asset prices. However, assuming that the conditional distribution of future 
consumption has a density representation fC), the conditional expectation 
in (12.3.25) can be re-expressed in the following way: 


8-7! U (Cy) 
„(00A = 0 —— — 0% d Cr 19.3.26 
PAY CC OAL cl | (Cr) UG) fc Cr) 4C, ( ›) 
= ¢ x YCA f (Cry) dEr (12.3.97) 
e gn DE Y(O)], (12.3.28) 


where 
Len) БЕ ELTE LA NUN (12.3.20) 
| J Mia fi C) аСт 
and yy is the continuously compounded net vate of return between Cand 
T of an asset promising one unit of consumption at 75 i. e., it is the return 
on the riskless asset. 

This version of the Euler equation shows that an asset's current price can 
be expressed as its discounted expected payoff, discounted at the riskless rate 
of interest (sec Chapter 8 for a more detailed discussion). However, the ex- 
pectation is taken with respect to the SPD /, a marginal-rate-oFsubstitution- 
weighted probability density function, not the original probability density 
function f of future consumption. In a continuous-time setting, /' is also 
known as the risk-neutral pricing density (Cox and Ross [1976]) or the equiv- 
alent martingale measure (VMarrison and Kreps [1979]).* 

Once // is obtained, it can be used to price any asset at date {with a single 
liquidating payoff ai date T that is an arbitrary function of consumption 
(7% SPDs also provide the tink between preference-based equilibrium 
models of the type discussed in Chapter 8 and arbitrage-based derivative 
pricing models of the type discussed in Chapter 9. Indeed, implicit in the 


“See Huang and Litzeibergeir (T988, Chapter 5) for a more detailed discussion of SPDs. 

Securities with multiple pavoffs and infinite horizons can also be priced by the SPD, but 
in these cases the SPD mist be appropriately redefined to capture the timevariation in the 
marginal rates ob substitution-scec Breeden and Litzenberger (1978) and Radner (1982) for 
further discussion, 
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prices of all financial securities—derivatives or not—are the prices of Arrow- 
Debreu securities, and these prices may be used to value all other securities, 
no matter how complex. 


à 
Pricing Derivatives with SPDs 
Under some regularity conditions, we may express f* asan explicit function 
of t and T so that a single SPD /* (Ст; t, T) may be used to price an asset at 
any date f with a single liquidating payoff Y(Cr) at any future date T > t 
(see footnote 16): 


Р, = game) (Cr) / (Cr; t, T)dCr. (12.3. 30) 


and we shall adopt this convention for notational simplicity. For example, a 
European call option on date-T aggregate consumption Cr with strike price 
X has a payoff function Y(Cr) = тах[Ст— X, O] and hence its date-! price 
G, is simply 


G = е7"117-0 | max[Cy — Х,0]/*(С t, DdCr. (12.3.31) 


Even the most complex path-independent derivative security can be priced 
and hedged according to (12.3.30). For example, consider a security with 
the highly nonlinear payoff function: 


a-b 
а> 0, b <0 , 02.3.32) 
1 
а = c+ 8 log(—a/b). (12.3.33) 


This payoff function is a smoothed version of the payoff to an option portfo- 
lio commonly known as bullish vertical spread, in which a call option with a low 
strike is purchased and a call option with a high strike price is written (see 
Figure 12.5 and Cox and Rubinstein [1985, Chapter 1] for further details). 


E xtracting : SPDs from Derivatiues Prices i 

There is an even closer relation between option prices and SPDs than 
(12.3.30) suggests, which Ross (1976), Banz and Miller (1978), and Breeden 
and Litzenberger (1978) first discovered. In particular, they show that the 
second derivative of the call-pricing function G, with respect to the strike 
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Figure 12.6. Bullish Vertical Spread Payoff Function and Smoothed Version 


price X must equal the SPD: 


а С, 
әх? 
Therefore, impounded in every option pricing formula is the SPD /“. 

. To estimate the SPD using (12.3.34), we require a call option pric- 
ing formula. Although many parametric pricing formulas exist (sce Hull 
11993, Chapter 17] for some popular examples), Ait-Sahalia and Lo (1996) 
construct a nonparametric pricing formula that places fewer restrictions— 
Pinan smoothness and weak dependence—on the data-generating pro- 
cess of the underlying asset's price. While parametric formulas such as 
those of Black and Scholes (1973) and Merton (1973) offer great advan- 
tages when the parametric assumptions (c.g., geometric Brownian motion) 
arc satisfied, nonparametric methods are robust to violations of these as- 
sumptions. Since there is some empirical evidence that casts doubt on such 
assumptions, at least for stock indexes," the nonparametric approach may 
have some important advantages.“ 

| Given observed call option prices {G;, Xj, ti} (Were т, = Т, — 4), the 
prices of the underlying asset (/, and the riskless rate of interest {лу}, we 
may construct the smooth nonparametric call-pricing function as 


ef. (12.3.34) 


СР, x. r, n) = ЁС | P, Xx. r. n] (12.3.35) 
using a multivariate kernel К, formed asa product of d=4 univariate kernels: 


КР, X т, 1) = hn, UP), (x) I Cr) (лу), (12.3.36) 


U Sec lo and MacKinlay (1988), for example. 


"See Hutchinson, Lo, and Poggio (1994) and Ait-Sahalia (19962) for other nonparametric 
option pricing alternatives. 
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and hence 


бр Xe ) S Gi hy (P — Pky (X — X.) % r == r.) kn, ( r. — n,) C 
„„ A., T. ir) = = 7 B 
Усы КАР = POR (X = XD Kn, — r.) % Q1 7 n) 
(12.3.37) 


The option’s delta and SPD estimator then follow by differentiating P: 


BÖP, X, r. * 
dP 


1 


ACP, X, t, n) (12.3.38) 
BÖP, X. t. n) 


fir ren) = € 7 


| 


(12.3.39) 
ХаР 


Under standard regularity assumptions on the data-generating process 
as well as smoothness assumptions on the truc call-pricing funcion, Ai- 
Sahalia and Lo (1996) show that the estimators of the option price, the 
aption's delta, and the SPD are all consistent and asymptotically normal, 
and they provide explicit expressions for the asymptotic variances. 

Armed with the SPD, any derivative security with characteristic g and 
payoff function М, Pp) at Ttc can now be priced at date thy the pricing 
function: 


oo 
607. F. r. n) = el f Y. Pr) /'(OPp | Pc n) d. (12.3.40) 
0 


If the payolf function Y) is twice-differentiable in Pr, then 


Ш 


oo 
GIP T. r. n) gu Y(t, Pp) f (Pr , r. 0 (12.3.41) 


a 


i 


% BÈ 
| Yig, Pr) —; dPr (12.3.42) 
0 dP; 


© a Y, Pr) a 
| жыша M (12.3.43) 
0 917 


Integrating against G instead of its second derivative speeds up the conver- 
gence rate of the estimator—G converges at speed n, and its integral 
against a smooth function of Pr converges at speed nV? A? whereas the 
second derivative of G converges at n"? IP and its integral against a smooth 
function of Pp at n? 72. A factor of k? is gained in the speed of conver- 
gence by integrating the second derivative of the payoff function—when it 
exists—against G instead of integrating the payoff function itself against the 
second derivative of б, 
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AitSahatia and Lo (1996) apply this estimator to the pricing and delta- 
hedging of S&P 500 call and put options using daily data obtained from 
the Chicago Board Options Exchange for the sample period from January 
4, 1003 to December 31, 1995, vielding a total sample size of 4,431 obser- 
vations. “The estimates of the SPDs exhibit negative skewness and excess 
kurtosis, à common feature of historical stock returns (see Chapter E for 
example), Also, unlike many parametric option pricing models, the SPD- 
generated hin pricing formula is capable of capturing persistent volatility 
"smiles" and other empirical features of market prices. 


12.4 Artificial Neural Networks 


An alternative to nonparametric regression that has received much recent 
attention in the engineering and business communities is the artificial neural 
nelwark, Artilicial neural networks may be viewed as a nonparametric tech- 
nique, hence these models would fit quite naturally in Section 12.3. How- 
ever, because initially they drew their motivation from biological phenom- 
chain particular. from the physiology of nerve cells—they have become 
part of a separate, distinct, and burgeoning literature. (see Hertz, Krogh, 
and Palmer [1901], Hutchinson, Lo, and Poggio [1994], Poggio and Girosi 
[1990], and White [19909] for overviews of this literature). 

To underscore the common nonparametric origins of artificial neural 
networks, we describe three kinds of networks in this section, collectively 
known as larning networks (sec. Barron and Barron (1988]). lu Section 
12.4.1 we introduce the multilaver perceptron, perhaps the most popular 
type ofartificial neural network in the recent literature—this is what the torm 
"neural network" is usually taken to mean. In Sections 12.4.2 and 12.4.3 we 
present two other techniques that also have network interpretations: radial 
basis functions, and projection pursuit regression. 


12.4.1 Multilayer Perceptrons 


Perhaps the simplest example of an artificial neural network is the binary 
threshold model of McCulloch and Pitts (1943), in which an output variable Y 
taking on only the values zero and one is nonlinearly related to a collection 


of J input variables X,. J .... f in the following way: 
1 
You og УВ = п (12.4.1) 
il 


| oifu > 0 
ғ E x 242 
an n ilu < 0. Ente) 
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Figure 12.7. Binary Threshold Model 


According to (12.4.1), each input X; is weighted by a coefficient fjj, called 
the connection strength, and then summed across all inputs. If this weighted 
sum exceeds the threshold u, then the artificial neuron is switched on or 
activated via the activation function g(-); otherwise it remains dormant. This 
simple network is often represented graphically as in Figure 12.7, in which 
the input layer is said to be connected to the output layer. 

Generalizations of the binary threshold model form the basis of most 
current applications of artificial neural network models. In particular, 
to allow for continuous-valued outputs, the Heaviside activation function 
(12.4.2) is replaced by the logistic function (see Figure 12.8): 


1 
lae 


gu) = 


(12.4.3) 
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Figure 12.8. Comparison of Heaviside and Logistic Activation Functions 
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Figure 12.9. Multilayer Perceptron with a Single Hidden Layer 


Also, without loss of generality, we set И to zero since it is always possible 
to mode} а nonzero activation level by defining the first input Xi in 
which case the negative of that input's connection strength — Ву becomes 
the activation level. 

But perhaps the most important extension of the binary threshold 
model is the introduction of a hidden layer between the input layer and 

; the output layer. Specifically, let 


А 

Y = h| La (12.4.4) 
k=l 

В, = [Bu Be - Fu]. X= [X X XY, 


where A(-) is another arbitrary nonlinear activation function. ulis case, 
the inputs are connected to multiple Aidden units, and at cach hidden unit 
they are weighted (differently) and transformed by the (same) activation 
function g(-). The output of cach hidden unit is then weighted yet again—- 
this time by the a,’s—and summed and transformed Буга second activation 
function (.). Such a network configuration is an example of a multilayer 
perceptron (Mi P)—a single (hidden) layer in this case—and is perhaps the 
most common type of artificial neural network among recent applications, 
In contrast to Figure 12.7, the multilayer perceptron has a more complex 
network topology (scc Figure 12.9). This can be generalized in the obvious 
way by adding more hidden layers, hence the term multilayer perceptron. 
For a given set of inputs and outputs IX,, Yr}, MLP approximation 
amounts to estimating the parameters of the MLP network—the vectors 
В, and scalars ag, KI... . , K—typically by minimizing the sum of squared 
deviations between the output and the network, i.e., У [Y~ ZL, eug(9,X)D?. 
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In the terminology of this literature, the process of parameter estimation is 
called training the network. This is less pretentious than it may appear to 
be—an carly method of parameter estimation was backpropagation, and this 
does mimic a kind of learning behavior (albeit a very simplistic one). How- 
cver, White (1992) cites a number of practical disadvantages with backprop- 
agation (numerical instabilities, occasional non-convergence, etc.), hence 
the preferred method for estimating the parameters of (12.4.4) is nonlinear 
least-squares. 

Even the single hidden-layer MLP (12.4.4) possesses the universal ap- 
proximation property. U can approximate апу nouliucar function to an arbi- 
trary degree of accuracy with a suitable number of hidden units (see White 
(1992]). However, the universal approximation property is shared by many 
nonparametric estimation techniques, including the nonparametric regres- 
sion estimator of Section 12.3, and the techniques in Sections 12.4.2 and 
12.4.3, Of course, this tells us nothing about the performance of such tech- 
niques in practice, and for a given set of dita it is possible for one technique 
to dominate another in accuracy and in other ways. 

Perhaps the most important advantage of MLPs is their ability to approx- 
imate complex nonlinear relations through the composition of a network of 
rclatively simple functions. This specification lends itself naturally to parallel 
processing, and althongh there are currently no financial applications that 
exploit this feature of MLPs, this may soon change as parallel-processing 
software and hardware become more widely available. 

To illustrate the MLP model, we apply it to the artificial dataset gen- 
erated by (12.3.11). For a network with one hidden layer and five hidden 
units, denoted by MLP (1,5), with O0) set to the identity function, we obtain 
the following model: 


y, = 5.282 — 14.576g(-1.472 J J. 800 
— 5.41 lg(—2.628 + 0.64 1X) 
= 3.071g(13.288 — 2.347 Xj) + 6.320¢(~2.009 + 4.009 Xj) 
+ 7.802g(—3.816 + 2.484 X) (12.4.5) 
where gU) = 1/(l + e7"). This model is plotted in Figure 12.10 and 
compares well with the kernel estimator described in Section 12.3.1. Despite 


the fact that (12.4.5) looks nothing like the sine function, nevertheless the 
MLP performs quite well numerically and is relatively easy to estimate. 


"Backpropagation is essentially the method of stochasie approximation баа proposed by 
Robbins and Monro (19512). See White (1902) for further details. 
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Figure 12.10. ALLS) Model af ¥, = Sin(X,) + 0.5e, 


12.4.2 Radial Basis Functions 


The class of radial basis functions (RBFS) were first used to solve interpolation 
problems—fitting à curve exactly through a set of points (see Powel! (1987] 
forareview). More recently, RBEs have been extended by several researchers 
to perform the mare general task of approximation (see Broomhead and 
Lowe [ 1988], Moody and Darken [1989 [and Poggio and Givosi (1990]). In 
particular, Poggio and Girost (1990) show how КВЕ can be derived from the 
classical regufarration problem in which some unknown function У(Х) 
is to be approximated given a sparse dataset (X,, Y?) and some smoothness 
constraints, In terms of our multiple-repression analogy, the d-dimensional 
vector X, are the explanatory variables, ¥, the dependent variable, and i) 
the possibly nonlinear function that is the conditional expectation of Y, 
given X,, and hence 


„ a X,) Lern E[eiIX,] = 0. (12.4.6) 


The regularization (or nonparametric estimation) problem may then be 
viewed as the minimization of the following objective functional: 


[ 
Von) — 25 (1 ү, - n XM + U (12.4.7) 


1 
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where || - is some vector norm and D is a differential operator. The first 
term of the sum in (12.4.7) is simply the distance between m(X,) and the? 
observation V, the second term is a penalty function that is a decreasing" 
function of the smoothness of m(-), and A controls the tradeoff between 
smoothness and fit. 

In its most general form, and under certain conditions (see, for exam- 


ple, Poggio and Girosi (1990]), the solution to the minimization of (12.4.7) 
is given by the following expression: 


K 
aX) = Au U N. (42.4.8) 
k=l 


where [U,] are d-dimensional vector centers (similar to the knots of spline 
functions), {8x} are scalar coefficients, {7} are scalar functions, Y.) is a 
polynomial, and К is typically much less than the number of observations T 
in the sample. Such approximants have been termed hyperbasis functions by 
Poggio and Girosi (1990) and are closely related to splines, smoothers such 
as kernel estimators, and other nonparametric estimators.” 

For our current purposes, we shall take the vector norm to be a weighted 
Euclidean norm defined by a (d x d) weighting-matrix W, and we shall take 


the polynomial term to be just the linear and constant terms, yielding the 
following specification for m(-): 


k і 
(X) = ay aN, H (СХ, = Ux) WW, – U"), — (02:49) 
kal 

where ag and ay are the coefficients of the polynomial P(-). Miccheli (1986) 
shows that a large class of basis functions 74(-) are appropriate, but the most 
common choices for basis functions are Gaussians e- and multiquadrics 

Networks of this type can generate any real-valued output, but in appli- 
cations where we have some a priori knowledge of the range of the desired 
outputs, it is computationally more efficient to apply some nonlinear trans- 
fer function to the outputs to reflect that knowledge. This will be the case 
in our application to derivative pricing models, in which some of the RBF 
networks will be augmented with an output sigmoid, which maps the range 
(CO. оо) into the fixed range (0, 1). In particular, the augmented network 
will be of the form gCn(x)) where glu) = 1/(l + e^"). 

As with MLPs, RBF approximation for a given set of inputs and out- 
puts (X,, ¥), involves estimating the parameters of the RBF network—the 


“Yo economize on terminology, here we use RBFs to encompass both the interpolation 
techniques used by Powell (1987) and their subsequent generalizations, 
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d(d&-D/2 unique entries of the matrix W'W, the dk elements of the centers 
U.]. and the d+k+1 coefficients ag, ay, and (fi]—1typically by nonlinear 
least-squares, i.c., by minimizing У Y,- n(X))]* numerically. 


12.4.3 Projection Pursuit Regression 


Projection pursuit is a method that emerged from the statistics community 
or analyzing high-dimensional datasets by looking at their low-dimensional 

rojections. Friedman and Stuetzle (1981) developed a version for the 
nonlinear regression problem called projection pursuit regression (PPR). 
Similar to MLPs, PPR models are composed of projections of the data, i.c. 

products of the data with estimated coefficients, but unlike МІ. they also 
шше the nonlinear combining functions from the data. Following the 
ijotation of Section 12.4.1, the formulation for PPR with a univariate output 
can be written as 


к 
т(Х) = ao N a (12.4.10) 


where the functions m(-) are estimated from the data (typically with a 
smoother), the (o,] and ] are coefficients, K is the number of projections, 
and ао is commonly taken to be the sample mean of the outputs N. The 
similarities between PPR, RBF, and MLP networks should be apparent from 
(12.4.10). 


12.4.4 Limilations of Learning Networks 


Despite the many advantages that learning networks possess for approx- 
imating nonlinear functions, they have several important limitations. In 
particular, there are currently no widely accepted procedures fordetermin- 
ing the network architecture in a given application, e.g., the number of 
hidden layers, the number of hidden units, the specification of the activa- 
tion function(s), etc. Although some rules of thumb have Ri from 
casual empirical observation, they are heuristic at best. 

Difficulties also arise in training the network, Typically, network param- 
cters are obtained by minimizing the sum of squared errors, but because 
of the nonlinearities inherent in these specifications, the objective function 
imay not be globally convex and can have many local minima. 

Finally, traditional techniques of statistical inference such as significance 
testing cannot always be applied to network models because of the nesting of 
layers. For example, ifone of the an's in (12.4.4) is zero, then the connection 
strengths B, of that hidden unit are unidentified, Therefore, even simple 
significance tests and confidence intervals require complex conibinations 
of maintained hypotheses to be interpreted property. 
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12.4.5 Application: Learning the Black-Scholes Formula 


Given the power and flexibility of neural networks to approximate complex 
nonlinear relations, a natural application is to derivative securities whose 
pricing formulas are highly nonlinear even when they are available in closed 
form. In particular, Hutchinson, Lo, and Poggio (1994) pose the following 
challenge: [option prices were truly determined by the Black-Scholes for- 
mula exactly, can neural networks “learn” the Black-Scholes formula? In 
more standard statistical jargon: Can the Black-Scholes formula be esti- 
mated nonparametrically via learning networks with a sufficient degree of 
accuracy to be of practical use? 

Hutchinson, Lo, and Poggio (1994) face this challenge by performing 
Moute Carlo simulation experiments in which various neural networks are 
trained on artificially generated Black-Scholes option prices and then com- 
pared to the Black-Scholes formula both analytically and in out-of-sample 
hedging experiments to see how close they come. Even with training sets 
of only six months of daily data, learning network pricing formulas can 
approximate the Black-Scholes formula with reasonable accuracy. 

Specifically, they begin by simulating a two-year sample of daily stock 
prices, and creating a cross-section of options each day according to the 
rules used by the Chicago Board Options Exchange with prices given by 
the Black-Scholes formula. They refer to this two-year sample of stock and 
(multiple) option prices as a single training path, since the network is trained 
on this sample.?! Given a simulated training path {/?(¢)} of daily stock prices, 
they construct a corresponding path of option prices according to the rules 
of the Chicago Board Options Exchange (CBOE) for introducing options 
on stocks. 

A typical training path is shown in Figure 12.11. Because the options 
generated for a particular sample path are a function of the (random) stock 
price path, the size of this data matrix (in terms of number of options and 
total number of data points) varies across sample paths. For their training 
sel, the number of options per sample path range from 71 to 91, with an 
average of 81. The total number of data points range from 5,227 to 6,847, 
with an average of 6,001. 


The nonlinear models obtained from neural networks yield estimates 


“They assume that the underlying asset tor the simulation experiments is a typical NYSE 
stock, with an initial price (0) of $50.00, an annual continuously compounded expected rate 
of return p of E076, and an annual volatility o of 20%, Under the Black-Scholes assumption of 
à geometric Brownian motion, 


dP = uPdt tall, 
and taking the number of days per year to be 253, they draw 506 pseudorandom variates [e] 
from the distribution A (4/253, 07/253) to obtain two years of daily continuously compounded 
returns, which are converted to prices with the usual relation P(N = PO) expl Y 7. e} for 
170. 
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igure 12.11. Typical Simulated Training Path (see the text for parameters) 
Dashed line represents stock price, while the arrows represent the options on the stock, The 
y-coardinate of the tip of the arrow indicates the strike price (arrows are slanted to make different 
introduction and expiration dates visible), 


of option prices and deltas that are difficult to distinguish visually from the 
true Black-Scholes values; An example of the estimates and errors for an 
RBF network is shown in Figure 12.12, The estimated equation for this 
particular RBE network is 
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Figure 12.12. Typical Behavior of Four-Nonlinear-Term RBF Model 


P/ X ~ 1.05 59.79 —0.03 P/ X — 1.05 +1.62 
т+0.10 | 0.03 1024|| т+0.10 з 
+0.14P/X — 0.24x — 0.01, (124.11) 


where т 


D 


= Т—{. Observe from (12.4.11) that the centers in the RBF 
model are not constrained to lie within the range of the inputs, and in fact 
do not in the third and fourth centers in this example. The largest errors 
in these networks tend to occur at the kink-point for options at the money 
at expiration, and also along the boundary of the sample points. 
While the accuracy of the learning network prices is obviously of great 
interest, this alone is not sufficient to ensure the practical relevance of the 
nonparametric approach. In particular, the ability to hedge an option posi- 
tion is as important, since the very existence of an arbitrage-based pricing 
formula is predicated on the ability to replicate the option through a dy- 
namic hedging strategy (sce the discussion in Chapter 9). This additional 
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ronstraint provides additional motivation for regularization techniques and, 
specifically, the RBF networks used by Hutchinson, Lo, and Poggio (1994). 

In particular, delta-hedging strategies require an accurate approxima- 
ү of the derivative of the underlying pricing formula, and the need for 
accurate approximations of derivatives leads directly to the smoothness 
¢onstraint imposed by regularization techniques such as RBF networks.?? 
Hutchinson, Lo, and Poggio (1994) show that both RBF and MLP networks 
provide excellent delta-hedging capabilities for the simulated Black-Scholes 
data as well as in an empirical application to S&P 500 futures options, in a 
few cases outperforming the Black-Scholes formula (recall that the formula 
is derived under the assumption that delta-hedging is performed continu- 
ously, whereas these simulations assume daily delta-hedging). 

Although parametric derivative pricing formulas are preferred when 
they are available, the results of Hutchinson, Lo, and Poggio (1994) show 
that nonparametric learning-network alternatives can be useful substituics 
when parametric methods fail. While their findings are promising, we can- 
not yet conclude that such an approach will be successful in general—their 
simulations have focused only on the Black-Scholes model, and their empir- 
ical application consists of only a single instrument and time period, S&P 
500 futures options for 1987 to 1991. 

However, this general approach points to a number of promising direc- 
tions for future research. Perhaps the most pressing item on this agenda is 
the specification of additional inputs, inputs that arc not readily captured by 
parametric models, e. g., the return on the market, general market volatility, 
and other measures of business conditions. 

Other research directions are motivated by the need for proper statis- 
tical inference in the specification of learning networks. First, we require 
some method of matching the network architecture number of nonlinear 
units, number of centers, type of basis functions, eic. to the specific dataset 
at hand in some optimal and, preferably, automatic fashion. | 

Second, Ше relation between sample size and approximation error 
should be explored, either analytically or through additional Monte Carlo 
simulation experiment. Perhaps some data-dependent metric can be con- 


Classical approach [Reinsch (1967) ] is to regularize it by finding a sufficiently smooth function 

at solves the variational problem in (12.4.7). As we discussed earlier, RBF networks as well 
aß splines and several forms of MLP networks follow directly from the regularization approach 
ара are therefore expected to approximate not only the pricing formula but also its derivatives 
(provided the basis function corresponding to a smoothness prior is of a sufficient degree, sec 
Puggio and Girosi (1990): in particular, the Gaussian is certainly sufficiently smooth for our 
problem). A special case of this general argument is the result of Gallant and White (1992) and 
Hornik, Stinchcombe, and White (1990) who show that single-hidden-layez MLP networks can 
approximate the derivative of an arbitrary nonlinear mapping arbitrarily well as the number 
of hidden units increases. 

i 


7 in fact, it із well known that the problem of numerical differentiation is ill-posed. The 
| 
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structed, that can provide real-time estimates of approximation errors in 
much the same way that standard errors may be obtained for typical statis- 
tical estimators. 

And finally, the need for better performance measures is clear, While 
typical measures of goodness-of-fit such as / do olfer some guidance for 
model selection, they are only incomplete measures of performance. Morc- 
over, the notion of degrees of freedom is no longer well-defined for nonlin- 
car models, and this has implications for all statistical measures of fit. 


12.5 Overfitting and Data-Snooping 


While cach of the nonlinear methods discussed in this chapter has its own 
costs and benefits, the problems of overfuting and data-snooping affect all of 
them to the same degree. Overfitting occurs when a model fits "too well,” in 
the sense that the model has captured both random noise as well as genuine 
nonlinearities. Heuristically, the primary source of over fitting is having 100 
few “degrees of freedom" or too many parameters relative to the number 
of datapoints, and a typical symptom is an excellent in-sample fit but poor 

out-of-sample performance? Data-snooping is a related problem that can 
lead to excellent but spurious out-of-sample performance. Data-snooping 
biases arise when we ignore the fact that many specification searches have 
been conducted to obtain the final specification of a model we are fitting 
to the data?! Even if a model is in fact incorrect, by scarching long enough 
Over various datasets and/or parameter values, we arc likely to find some 
combination that wid fit the data. However, this fit is spurious and is merely 
a symptom of our extensive search procedure. 

Unfortunately, there are no simple remedies to these two problems 
since the procedures that give rise to them are the same procedures that 
produce genuine empirical discoveries. The source of both problems is the 
inability to perform controlled experiments and, consequently, the heavy 
reliance on statistical inference for our understanding of the data. As with 
all forms of statistical inference, there is always a margin of error, and this 
margin is often sensitive to small changes in the way we process the data and 
revise our models. 


Sb he degrees of treedom of a nonlinear model ave olten difficult to determine because 
the notion of a "parameter" may be blurred. For example, the kernel regression may seem 
to have only one (ree parametec—the bandwidth п-и this is clearly misleading since each 
datapoint serves as a center for local averaging. See Hampel (1086) and Wahba (1990) for 
farther discussion, 

See Leamer (1978) and Lo and Mac Kinlay (1990b) for formal analyses ot suck biases, and 
Black (1993) tor a decent example in the finance literature. 
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Nevertheless, there are several ways to mitigate the effects of overfitting 
and datassnooping. For example, the impact of Systematic specification 
searches may often be calculated explicitly, as in Lo and MacKinlay (1990b). 
In such instances, using a corrected statistical distribution for inference will 
safeguard against finding significant results where none exist, Careful out- 
obsample performance evaluation can uncover overfitting problems, and 
il relatively few out-o sample tests are conducted, or if they are conducted 
over different (and weakly correlated) datasets, this will Minimize the effects 
of data-snooping. 


But perhaps the most effective means of reducing the impact of over- 
fitting and data-snooping is to impose some discipline on the specification 
search by a priori theoretical considerations. These considerations may be 
in the form of welbarticulated. mathematical models of economic behav- 
ior, or behavioral models motivated by psychological phenomena, or simply 
heuristic rules of thumb based on judgment, intuition, and past experience, 
While all of these sources are also affected by data-snooping and overfiuing 
to some extent—no form of inference can escape these problems—they are 
less susceptible and offer a less data-dependent means of model validation, 

All this suggests the need for an a priori framework or specification 
for the model before confronting the data, By proposing such a specifica- 
tion, along with the kinds of phenomena one is seeking to capture and the 
relevant variables to be used in the search, the chance of coming upon a 
spuriously successful model is reduced, 


12.6 Conclusion 


Nonlinearities are clea ly playing û more prominent role in financial appli- 
cations, thanks to increases in computing power and the availability of large 
datasets. Unlike the material presented in earlier chapters, some of the 
ideas in this chapter are less well-established and more tentative, Within a 
short time many of the techniques we have covered will be refined, and some 
may become obsolete, Nevertheless, it is important to develop a sense of 
the direction of research and the open questions to be addressed, especially 
atthe early stages of these explorations, 

Despite the Hexibility of the nonlinear models we have considered, they 
do have some serious limitations, They are typically more difficult to esti- 
mate precisely, more sensitive to outliers, numerically less stable, and more 
prone to overfitting and data-snooping biases than comparable linear mod- 
els. Contrary to popular belief, nonlinear models require more economic 
structure and a priori considerations, not less. And their interpretation often 
requires more effort and care. However, nonlinearities are often a fact of 
economic life, and for many financial applications the sources and nature 
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of nonlinearity can be readily identified or, at the very least, characterized in 
some fashion. In such situations, the techniques described in this chapter 
are powerful additions to the armory of the financial econometrician. 


Problems—Chapter 12 


12.1 Most pseudorandom number generators implemented on digital com- 
puters are multiplicative linear congruential generators (MLCG), in which X, = 
(aX, + c) mod m, where ais some “well-chosen” multiplier, cis an optional 
constant, and m is equal to or slightly smaller than the largest integer that 
can be represented in one computer word. (For example, leta = 1664525, 
c = 0, апа m = 232.) In contrast to MI.CG numbers, consider the fol- 
lowing two nonlinear recursions: the tent map (see Section 12.1.1) and the 
logistic map, respectively: 


| 
| 
| 


2Х„-1 if X, < 1 

ка | 2. X € (0,1) (1261) 
2(1 ER Xia) if X4 > ? 

*. = ANU x. X € (0,1). (12.6.2) 


| 
"These recursions are examples of chaotic systems, which exhibit extreme 


sensitive dependence to initial conditions and unusually complex dynamic 
behavior. 


12.1.1 What are good properties for pseudorandom number generators 
to have, and how should you make comparisons between distinct gener- 
ators in practice (not in theory)? 


12.1.2 Perform various Monte Carlo simulations comparing MLCG to 
the tent and logistic maps to determine which is the better pseudorandom 
number generator. Which is better and why? In deciding which criteria 
to use, think about the kinds of applications for which you will be using 
the pseudorandom number generators. Hint: Use 1.99999999 instead of 
2 in your implementation of (12.6.1), and 3.99999999 instead of 4 in your 
implementation of (12.6.2)—for extra credit: Explain why. 


12.2 Estimate a multilayer perceptron model for monthly returns on the 
S&P 500 index from 1926:1 to 1994:12 using five lagged returns as inputs and 
one hidden layer with ten units. Calculate the in-sample root-mean-squared- 
error (RMSE) of the one-step-ahead forecast of your model and compare 
it to the corresponding out-of-sample results for the test period 1986:1 to 
1994:12. Can you explain the differences in performance (if any)? 
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12.3 Use kernel regression to estimate the relation between the monthly 
returns of IBM and the S&P 500 from 1965:1 to 1994:12. How would a 
conventional beta be calculated from the results of the kernel estimator? 
Construct at least two measures that capture the incremental value of kernel 
estimation over ordinary least squares. 


Appendix 


THis APPENDIX PROVIDES a brief introduction to the most commonly used es- 
timation techniques in financial econometrics. Many other good reference 
texts cover this material in more detail, hence we focus only on those aspects 
that are most relevant for our immediate purposes, Readers looking for a 
more systematic and comprehensive treatment should consult Hall (1992), 
Hamilton (1994), Ogaki (1992), and White (1984). 

We begin by following Hall's (1992) exposition of linear instrumen- 
tal variables (IV) estimation in Section A.] as an intuitive introduction to 
Hansen's (1982) Generalized Method of Moments (GMM) estimator. We 
develop the GMM method itself in Section A.2, and discuss methods for 
handling serially correlated and heteroskedastic errors in Section A.3. In 
Section A.4 we relate GMM to maximum likelihood (ML) estimation. 


A. 1 Linear Instrumental Variables 


Consider a linear relationship between a scalar y, and а vector x,: y = 
X Oo + Eo), (t = V... T. Stacking the T observations, this can be written as 


y= хө, + €(04). (A. 1.1) 


where y is a (7x1) vector containing T observations of y, X is a (7x Ny) 
matrix containing T observations of the Ny independent variables in xs, Ө 
is an (Ny x1) parameter vector, and €(84) is a (7x1) vector containing T 
observations of the error term є The error term is written as a function of 
the true parameter vector so that the notation € can be used for both the true 
equation error and the residual of an estimated equation. For simplicity, 
assume that the error term is serially uncorrelated and homoskedastic, with 
variance o:; thus the variance of e(0,) is Var[e(Oy)] OI, where Hy is a 
(Tx T) identity matrix. 
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There are also available Nj, instruments in an (NV x I) column vector hy, 
The Tobservations of this vector form a (£x Nj) matrix H. The instruments 
have the property that L. (he, (OH)) is an (Ми x1) vector of zeroes; that is, the 
instruments are contemporancously uncorrelated with the error e! The 
statement that a particular instrument is uncorrelated with the equation 
error is known as an orthogonality condition, and IV regression uses the Ny 
available orthogonality conditions to estimate the model, 

Given an arbitrary coefficient vector Ө, we can form the corresponding 
residual 6%) = Y х0. Stacking this residual into a Vector, we pet e(O) = 
y — ХӨ. We can also define an (Na x 1) column vector containing the cross- 
product of the instrument vector with the residual, 


f(0) = h,e,(0). (A.1.2) 


The expectation of this CUOSS-product is an (Мухі) vector of zeroes at the 
true parameter vector: 


11,0%] = 0. (A.1.3) 


The basic idea of IV estimation is to choose coefficients to satisfy this condi- 
tion as closely as possible, Of course, we do not observe the true expectation 
of fand so we must work instead with the sample average of f. We write this 
as £r(0), using the subscript 7 to indicate dependence on the sample: 


1 
810) = TIYO = r hre) = тне), (A.1.4) 


t=] 


Minimum Distance Criterion 

In general there may be more clements of gr(0) than there are cocthicients, 
and so in general itis not possible to set all the elements of g7(8) to zero. 
Instead, we Minimizes quadratic form, a weighted sum of squares and cross- 
products of the elements of g,(0). We define the quadratic form Q (0) as 


0100) = g OVW, g, (бу = [7 eO HIW | 77  H'e(9)]. (A.1.5) 


Where W, is an (Nip x Nyy) symmetrie, positive definite weighting matrix, 
IV regression chooses 0 , as the value of O that minimizes Q70). Sub- 
stituting the definition of e(0) into (A.1.5), the first-order condition for the 


n manv applications, e tisalorecastetrn that is мисо ете with any yar Мех known in 
advance. In this case the instrumen yecto hy will include only happed variables that are known 
abtime?2- bin euler. Nonetheless we witle Ius В, for notation; simplicity and penerality, 
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minimization problem is 
XH Wr Ну = XHWr H’X6r. (A.1.6) 


When the number of instruments, Ny, equals the number of parameters to 
be estimated, Ny, and the matrix HX is nonsingular, then X'H Wr cancels 
out of the left- and right-hand sides of (A.1.6) and the minimization gives 


^ 


= (HN) Hy. (A. I. 7) 


This estimate is independent of the weighting matrix Wr, since with Ny = 
Nx all the orthogonality conditions can be satisfied exactly and there is no 
need to trade off one against another. It is easy to see that (A. I. 7) gives the 
usual formula for an OLS regression coefficient when the instruments H 
are the same as the explanatory variables X. 

More generally, Nj; may exceed Nx so that there are more orthogonality 
conditions to be satisfied than parameters to be estimated. In this case the 
model is overidentified and the solution for 97 is | 


Ûr = (X'HW;H'X)'!X'HWrH/y. (A.1.8) 


Asymptotic Distribution "Theory | 
The next step is to calculate the asymptotic distribution of the parameter 
estimate Өү. Substituting in for y from (A.1.1) and rearranging, we find that 


Wor- 00) = (T! X HWr THX) T!X'HWr T-W'e(8,). | 
(A.1.9) 

Now suppose that as T increases, T OHH converges to Myy, a non- 
singular moment matrix, and T-!XH converges to Мх, a moment matrix 
of rank Ny. Suppose also that as T increases the weighting matrix Wr 
converges to some symmetric, positive definite limit W. Because we have as- 
sumed that the error €(@9) is serially uncorrelated and homoskedastic, when 
the orthogonality conditions hold Т-!/Н'є(00) converges in distribution jo 
a normal vector with mean zero and variance-covariance matrix oM. М) 
use the notation S for this asymptotic variance-covariance matrix: 


S = lim Var[ T^! ?H'e(09)] = о?Мин. (A.1.10) 
— 00 


Using (A.1.4), S can be interpreted more generally as the asymptotic vari- 
ance of T! times the sample average of f, that is, Г times gr: 


S = lim Var r^ tty = Jim Var T 7 gr (90). (A.1.11) 
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With these convergence assumptions, (A.1.9) implies that 


br b > NOY), (A.1.12) 
wherc 
У = (MyuWMjx) My WSWMg y (Mx Му x)! 


ii 


a * (My WM ix)! Max) WM WM их (МММ), (A.1.13) 


Я — (4 
and Myx = M. 


Optimal Weighting Matrix 


We have now shown that the estimator Өү. is consistent and asymptotically 
normal, The final step of the analysis is to pick a weighting matrix W that 
minimizes the asymptotic variance matrix V and hence delivers an asymp- 
totically efficient estimator. It turns out that V is minimized by picking 
W equal to any positive scalar times S7}. Recall that S is the asymptotic 
variance-covariance matrix of the sample average orthogonality conditions 
| 7(@). Intuitively, one wants to downweight noisy orthogonality conditions 
and place more weight on orthogonality conditions that are precisely mea- 
sured. Since here S7! = сМ, it is convenient to set W equal to 


| W = м (۸.1.14) 
the formula for V then simplifies to \ 
| V = c" (My M Mix)”. (A.1.15) 
f practice one can choose a weighting matrix 


{ 
W, = (J- HH). (A.1.16) 


S T increases, W} will converge to W*. 
With this weighting matrix the formula for Өу becomes 


| Û, = (XH(H'HJ HX] 'X'H(H'H)J'H'y = NH (A147) 
| 


where X = H(H'H)*! H'X is the predicted value of X in a regression of X 
on H. This is the well-known two-stage least squares (2515) estimator. It 
can be thought of as a two-stage procedure in which onc first regresses X on 
H, then regresses y on the fitted value from the first stage to estimate the 
parameter vector 00. 


Alternatively, one can think of 2515 as regressing both X and y on H in 
the first stage, and then regressing the fitted value ofy on the fitted value of X; 
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exactly the same coefficient estimate (A.1.17) is implied. Note that under 
this alternative interpretation, the second-stage regression asymptotically 
has an R? statistic of unity because the error term in (A. I. I is orthogonal to 
the instruments and therefore has a fitted value of zero when projected on 
the instruments. This implies that asymptotically, if (A.1.1) and the orthogo- 
nality conditions hold, then the coefficient estimates should not depend on 
which variable is chosen to be the dependent variable in (A. I. I) and which 
are chosen to be regressors. Asymptotically, the same coefficient estimates 
will be obtained (up to a normalization) whichever way the regression is 
written, 
The variance-covariance matrix of 2SLS cocfficient estimates, V*, can 
be estimated by substituting consistent estimates of the various moment 
matrices into (A.1.15) to obtain 


V. = ôT X (BH! xo)! (A.1.18) 


where 6? is a consistent estimate of the variance of the equation error. This 
formula is valid for just-identified IV and OLS coefficient estimates as well. 

In place of the weighting matrix W* defined above, it is always possible 
to use &W* where & is any positive scalar. Similarly, in place of the weighting 
matrix Wy one can use АМ, where ky is any positive scalar that converges 
tok, This rescaling does notaffect the formula for the instrumental variables 
estimator (A.1.17). One possible choice for the scalar kis c, the reciprocal 
of the variance of the equation error e; this makes the weighting matrix 
equal to S7!. The corresponding choice for the scalar kr is some consistent 
estimate б 7? оѓо, Hansen (1982) has shown that with this scaling, T times 
the minimized value of the objective function is asymptotically distributed 
X^ with (Nj — Ny) degrees of freedom under the null hypothesis that (A.1.1) 
holds and the instruments are orthogonal to the equation error. 

Hanseu's test of the null hypothesis is related to the intuition that under 
the null, the residual from the IV regression equation should be uncorre- 
lated with the instruments and a regression of the residual on the instru- 
ments should have a "small? A? statistic. To understand this, note that when 
We = (6277 VHD, the minimized objective function is 


{тб YHET IID I et). (A.1.19) 


Now consider regressing the residual PUN, on the instruments II. The fitted 
value is H(H'H)-!H'e(05), and the R? statistic of the repression converges 
to the same limit as (A.1.19). Thus Hansen's result implies that T times the 
R? in a regression of the residual on the instruments is asymptotically x? 
with (Nj; — Nx) degrees of freedom. This is a standard test of overidentifying 
restrictions in two-stage least squares (Engle [1984]). 
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A.2 Generalized Method of Moments 


The Generalized Method of Moments (Hansen [1982]) can be understood 
ws an extension of the linear IV regression we have discussed. Suppose now 
that we have a model which defines a vector €, = N,. 0), where x, now 
includes all the data relevant for the model (that is, we have dropped the 
distinction between y and x), and O isa vector of Na coefficients. This 
formulation generalizes the linear IV regression in three Was. First, e(x,. 0) 
can. be à column vector with N elements rather than à scalar. Second, 
(X/. 0) can be a nonlinear rather than a linear function of the data aud the 
parameters, Third, e(x,, 0) can be heteroskedastic and serially correlated 
rather than homoskedastic White noise; Our model tells us only that there 
is some true set of parameters Oy for which €(x,, 00) is orthogonal to à set 
ol instruments; as before these are written in an (M/ x 1) column vector h,. 
By analogy with (A.1.2) we define 


£40) = h, Ф elx, 0). (A.2.1) 

The notation o denotes the Kronecker product of the two vectors. That is, f 

is a vector containing the cross-produet of each instrument in h with cach 

clement of e; f is therefore a column vector with № = N, Ny elements, and 
the model implies by analogy with (A.1.3) that 

KIEO} = 0. (X. 2.2) 

Just as in (X. I. I), we define a vector g7(8) containing the sample ever 


ages corresponding to the elements of f in (A.1.19): 


1 
gii = T Уге). (۸.2.3) 


11 


By analogy with (V1.5), GMM minimizes the quadratic form 
0,00) = gO Wr gr). (A. 2.1) 


Since the problemn is now nonlincar, this minimization must be performed 
numerically, “Phe firstordes condition is 


DO Wig ;) = 0, (A.2.5) 
where D(is amatis of partial derivatives defined һу 


0,00) = dg, (0)/a0'. DN 


Phe . j element ol D, (0) is 87,0% U,. 
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Asymptotic Distribution Theory . | 
The asymptotic distribution of the coefficient estimate 07 is i 
r- 00) MN. У), (А27) 
where ; 
у = (D ,WDa) ` D, WSWD, (D, WD)". (A.2.8) 


These expressions are directly analogous to (A.1.12) and (A.1.13) for the lin- 
car instrumental variables case. Do is a generalization of Mx in those equa- 
tions and is defined by Do = E[9f(x;, 05)/304]. Dr(0) converges asymptot- 
ically to Do. S is defined as in (A. I. II) by 5 


T 
S = lim Var p tigre Jim Varl Tg r9% J. (A.2:9) 
— * 


/ ~ 
Iz] 


Optimal Weighting Matrix 

Just as in the linear IV case, the optimal weighting matrix that minimizes 
V is any positive scalar times S7", With an optimal weighting matrix the 
asymptotic variance of TY? times the coefficient estimate Û is 


ODS Do) -L. (A. 2.10) 


Also, when the weighting matrix S7! is used, T times the minimized objective 
function is distributed x? with (Ny — Му) degrees of freedom, where N, is the 
number of orthogonality conditions and M is the number of parameters to 
be estimated. 


In practice, of course, S and the other quantities in (A.2.8) must be 
estimated, To do this, one starts with an arbitrary weighting matrix Wr; this 
could be the identity matrix or could be chosen using some prior informa- 
tion about the relative variances of the different orthogonality conditions. 
Using Wy, one minimizes (A.2.4) to get an initial consistent estimate Âr. 
To estimate Win (A.2.8), one replaces its elements by consistent estimates. 
Du can be replaced by D, (B4), W can be replaced by Wr, and S can be 
replaced by a consistent estimate $0). Given these estimates, one can 
construct a new weighting matrix Wy = $ S,(0,)^! and minimize (A.2.4) 
again to get a second-stage estimate 9 The asymptotic variance of 7017 
times the second-stage estimate can be таа аѕ 


„(DS 005). (A.2.11) 
and the second-stage minimized objective function is distributed x? with 


UN, — №) degrees of freedom. Although a two-stage procedure is asymp- 
totically efficient, it is also possible to iterate the procedure further until 
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the parameter estimates and minimized objective function converge. This 
climinates any dependence of the estimator on the initial weighting matrix, 
and it appears to improve the finite-sample performance of GMM when the 
number of parameters is large (Ferson and Foerster [1994]). 


A.3 Serially Correlated and Heteroskedastic Errors 


One of the most important steps in implementing GMM estimation is esti- 
тн the matrix S. From (A.2.9), 


n 


^ Н 
S im ЕТТ! (Ө f,(00)' 
lim E | 7 D 700) = (б) 


| " 
i roc) >> (Гб) YT, (). (4.3.1) 
| 2 
whete 

| . 00 = E [6000) YC (А.3.9) 
is the jth autocovariance matrix of Е, (Өд). The matrix S is the variance- 
covariance matrix of the time-average of f. (Co): equivalently, itis the spectral 
density matrix of f. (Ho) at frequency zero. It can be written as an infinite 
sum pf autocovariance matrices of f. (00). 

f the autocovariances of f. (0) are zero beyond some lag, then one can 
simplify the formula (A.3.1) for S by dropping the zero autocovariauces. The 
auto¢ovariances of f. O) are zero if the corresponding autocovariances of 
€(x,, Bo) are zero. In thc linear IV case with serially uncorrelated errors 
fie earlier, for example, є(х;, ĝo) is white noise and so f,(09) is white 
noise: in this case one can drop all the autocovariances Г, for j > 0 and S 
is just Го, the variance of f. (O). The same result holds in the consumption 
CAPM with onc-period returns studied in Chapter 8. However in regressions 
with K-period returns, like those studied in Chapter 7, K~ | autocovariances 


of (00) are nonzero and the expression for S is correspondingly more 
complicated. 


The Newey-West Estimator 
To estimate S it would seem natural to replace the true autocovariances of 
f,(89), T;(69), with sample autocovariances 


1 
бшу = T û 2 30 
T,r(07) = 17! S e. (A.3.3) 
t=)4+t 
and substitute into (A.3.1). However there are two difficulties that must 
be faced. First, in a finite sample onc can estimate only a finite number 
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of autocovariances; and to get a consistent estimator of S one cannot allow 
the number of estimated autocovariances to increase too rapidly with the 
sample size. Second, there is no guarantee that an estimator of S formed by 
substituting (A.3.3) into (A.3.1) will be positive definite. То handle these 
two problems Newey and West (1987) suggested the following estimator: 


* ^ 4 — j ^ ^ 
5100.01) = Torr) + 7 (24) (гуду) + г, (01) . (A34) 
j=l 


where / increases with the sample size but not too rapidly. 4—1 is the max- 
imum lag length that receives a nonzero weight in the Newey and West 
(1987) estimator. The estimator guarantees positive definiteness by down- 
weighting higher-order autocovariances, and it is consistent because the 
downweighting disappears asymptotically. 

In models where autocovariances are known to be zero beyond lag K -l. 
it is tempting to use the Newey and West (1987) estimator with q = K. 
` This is legitimate when K=1, so that only the variance Го (д) appears in 
the estimator; but when K> 1 this approach can severely downweight some 
nonzero autocovariances; depending on the sample size, it may be better to 
use {>К in this situation. 

Although the Newey and West (1987) weighting scheme is the most 
commonly used, there are several alternative estimators in the literature 
including chose of Andrews (1991), Andrews and Monahan (1992), and 
Gallant (1987). Hamilton (1994) provides a useful overview. 


The Linear Instrumental Variables Case 

The general formulas given here apply in both nonlinear and linear models, 
but they can be understood more simply in linear IV regression models. 
Return to the linear model of Section A.1, but allow the error term e,(04) 
to he serially correlated and heteroskedastic. Equation (A. I. 10) becomes 


S = lim Var( TTH H'e(9y)] = Jim TW OUO, (A.3.5) 


where £2(04) is the variance-covariance matrix of e(04).. This can be esu- 
mated by 


51061) = T'HA IH (A.3.0) 

where Qy (0) is an estimator of $2(04). Equation (A.2.11) now becomes 
Vy = (TOX'HOO;0,)8) WX). (A.3.7) 
In the homoskedastic white noise case considered earlier, Q I so 


we used an estimate 0107) = oly where a? = 17! y d 6707). Substi- 
tuting this into (A.3.7) gives (A.1.18). 
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When the error term is serially uncorrelated but heteroskedastic, then 
Q is a diagonal matrix with different variances on cach element of the main 
diagonal, One can constructa simple equivalent of each element as follows. 
For cach element on the main diagonal of the matrix, 27 E, y, 
while each otldiagonal clement 02, (Op) = 0 for sft. Substituting the re- 
sulting matrix (2,0, ) into (A.3.6) ane gets a consistent estimator ofS, (0). 
and substituting it into (А.3.7) one gets a consistent estimator ys This is 
true even though the matrix 200% is not itself a consistent estimator of Q 
because the number of elements of Q that must be estimated equals the 
sample size. 

When the error tein is serially correlated and homoskedastic, then one 


can construct each element of the matrix f25(8 ү) as follows: 


Ob aD = gy aie (% енде) fl = реа 6. 
0 otherwise 
(A.3.8) 
where the Newey and West (1987) weighting scheme with inaximum lag 
length q is uscd. Alternatively, one can replace the triangular weights (/ — 
D/q with unit weights to get the estimator of Hansen and Hodrick (1980), 
but this is not guaranteed to be positive definite. 
When the error term is serially correlated and heteroskedastic, then 
one can construct 927,605 as: 


(£3) «e ifl = рар 6. 


0) otherwise 


Qy Or) = (A.3.9) 


where the Newey and West (1987) weighting scheme is used. Again one 
can replace the triangular weights with unit weights to get the estimator of 
Hansen and Hodrick (1980). In each case substituting (0+) into (A.3.0) 
gives a consistent estimate of S, and substituting it into equation (4.3.7) gives 
aconsistent estimator V*. even though the matrix 2104) is not itselfa consis- 
tent estimator of Q because the number of nonzero elements increases too 
rapidly with the sample size; White (1984) gives a comprehensive treatment 
of the linear model with serially correlated and heteroskedastie errors, 


A.4 GMM and Maximum Likelihood 


Following Hiunilton (1994), we now show how some well-known properties 
of Махти»  Likclibood estimators (MEE) can be understood in relation 
to GMM. We fast lav ott some notion. We use L to denote the density of 
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X41 conditional on the history of x, and a parameter vector 0: 
Lí((xi41,0) = Leia, | xo, Xp... 6). (A.4.1) 
We use the notation €, for the log of L, the conditional log likelihood: 
£(x41,0) = log (хац. 0). (A.4.2) 


The log likelihood £ of the whole data set xi... . Xr is just the sum of the 
conditional log likelihoods: | 


" 
L= у. (A.4.3) 
t=} 


Since KL, is a conditional density, it must integrate to 1: 


| im tina = 1. (A.4.4) 


Given certain regularity conditions, it follows that the partial derivative of 


L, with respect to 8 must integrate to zero. A series of simple manipulations 
then shows that 


$ 
0 = [ 5599 as - [Мыз ш, 
00 be 30 Lo 1+1 
де, (хил. 0 
= J n, dii 
b 96041,0) „ 06041. 0) 
= Eg eT MEE (A.4.5) 


The partial derivative of the conditional log likelihood with respect to 
the parameter vector, 0£,(x,41, 0)/20, is a vector with Ng elements. It is 
known as the score vector. From (A.4.5), it has conditional expectation zero 
when the data are generated by (A.4.1). It also has unconditional expecta- 
tion zero and thus plays a role analogous to the vector of orthogonality con- 
ditions f, in GMM analysis. The sample average (1/ T) S дё, (хг, 0)/00 
plays a role analogous to gr) іп GMM analysis. 

The maximum likelihood estimate of the parameters is just the solution 


Û to Max £(8) = b» £,. The first-order condition for the maximization 
can be written as 


т 
кт(@) = I 06641,6000 = 0, (А.4.6) 
t=} 


which also characterizes the GMM parameter estimate for a just-identified 


model. Thus the MLE is the same as GMM based on the orthogonality 
conditions in (A.4.5). 


. 
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Asymptotic Distribution Theory 


The asymptotic distribution of ML parameter estimates is given by the fol- 
idwing result 
| VT (8 - б) ~ N (o. a). (A.4.7) 
where T given by: 
| 
Ə? £(8) 
1(0) = lim -E AAS 
| te) es ls 80 90' 7000 . ; | 
and is known as the information matrix. T can be estimated by the sample 
See pa 
I Oven 
T2 EM (A.4.9) 
155 5000 


The information matrix gives us a measure of the sensitivity of the value of 
tHe likelihood is to the values of the parameters in the neighborhood of 
dje maximum. If small changes in the parameters produce large changes 
in likelihood near the maximum, then the parameters can be precisely es- 
tilnated.: Since the likelihood function is flat at the maximum, the local 
sensitivity of the likelihood to the parameters is measured by the local cur- 
vaturc (the sccond derivative) of likelihood with respect to the parameters, 
evaluated at the maximum. 


Information-Matrix Equality 


An alternative estimator of the information matrix, Ty, uses the average 
outer product or sample variance of the score vectors: 


E. _, al (Ô) 0€, (8) 
I, 7 ys Shu „ ER (A41) 


То see why T, converges to the same limit as Ta, differentiate the third 
equality of equation (A.4.5) with respect to 0° to get 


a | ^9 bodies +f DO (X41, 0) Lx el. 0) 


t=} 


dx 

0608 36 30 un 
e 5 (хуз. 0) lx DE (X41, 0) 96,(x,,1, OY EA 
„ декат, 0) qp 2601.0) Pla 0Y 
90005 E 20 90 


: OPE ra 8) | ats OY ETO (A4.11) 
3800 20 00 
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This is known as the information-matrix equality, and implies that the expec- 
tations to which the sample averages T and Т, converge are equal. The 
information matrix equality holds under the assumption that the data ic 
generated by (A.4.1). 

GMM analysis gives an alternative formula for the distribution of MI. 
parameter estimates. Recall from (A. 2. 11) that the GMM estimator is asymp 
todcally normal with asymptotic variance estimator 


Vy = (00775003) "Dj p) '. 


lu this case 


Ав 1 42 7 

ae 09.) pet (O EMO) _ 
D * 0.. u = ae = T, wes, 2 
r( г) 90 181 agad кы 

while 

m L al) ае, у Е 
Sy) = TISTL ар 413 
10% 2- 00 ав е e 


since the score vector is serially uncorrelated so S can be estimated from its 
sample variance, Therefore, the distribution of the GMM estimator can be 
expressed as: 


VT(O — Oy) & (о. бубу). (A.4.14) 


where Z, and T; are the limits of T. and p as T increases without bound, 
evaluated at the true parameter vector 00. 

When the model is correctly specified, y» aud d both converge to the 
information matrix Z, hence (A.4.14) simplifies in the limit to (ZZ~!Z)-! = 
17! which reduces to the conventional expression for the asymptotic vari- 
ance in (A.4.7), Therefore, either T. or Т; (or both) can be used to estimate 
T in this case. 

However, when the model is misspeciſied. A and 1, converge to dif- 
ferent limits in general; this has been used as the basis for a specification 
test by White (1982). But ML estimates of the misspeciſied model are still 
consistent provided that the orthogonality conditions hold, and one can 
usc the general variance formula (A.4.14) provided that the score vector is 
serially uncorrelated. White (1982) suggests this approach, which is known 
as quasi maxim likelihood estimation. 


Hypothesis Testing 
The asymptotic variances in (A.4.7) and (А414) can be used in a Straight- 
forward manner to construct Wald tests of restrictions on the parameters. 
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The idea of such tests is to sec whedier the unrestricted parameter estimates 
are significantly different from their restricted values, where the variance of 
the unrestricted estimates is calculated without imposing the restrictions. 

Alternatively, one may want to test restrictions using estimates only of 
the restricted model. Once restrictions are imposed, the minimized СММ 
objective function is no longer identically zero. Instead, the Hansen (1982) 
result is that T times the minimized objective function has a x? distribution 
with degrees of freedom equal to the number of restrictions. In this case T 
times the minimized objective function is just 


Te (Y T, ! gy (0). (A.4,15) 


which is the Lagrange multiplier test statistic for a restricted model estimated 
by maximum likelihood. 


The Delta Method 
More complicated inferences for arbitrary nonlinear functions of the es- 
timator 0 may be performed via Taylor's Theorem or the delta method. W 


МТ — 00) N. Vi), then a nonlinear function f (9) has the following 


asymptotic disiribution: 


il 


. . 1 
VT (ê - ı0) 3 NOV), y= af v (A.4.16) 


which follows from a firstordev Taylor series approximation for /(0) around 
Oo. Higherorder terms converge to zero faster than 1/4 T hence only the 


first term of the expansion matters for the asymptotic distribution of f(0). 
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